Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
llm
26 Indlæg 24 Posters 78 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • moink@fedi.splitbrain.orgM moink@fedi.splitbrain.org

    @pseudonym That and LLM code often looks very nice on the surface so it takes a lot of vigilance and thinking to find the subtle errors. Code from juniors tends to have more immediate signs of errors or wrong mental models.

    wronglang@bayes.clubW This user is from outside of this forum
    wronglang@bayes.clubW This user is from outside of this forum
    wronglang@bayes.club
    wrote sidst redigeret af
    #21

    @moink @pseudonym one of the benefits of people *having* a mental model

    1 Reply Last reply
    0
    • hopeless@mas.toH hopeless@mas.to

      @pseudonym It's certainly like that.

      FWIW though LLMs don't have any shame or feeling they need to manage their reputation.

      If you tell the same LLM that produced the report that it is now the QA manager and it must review the report from the standpoints of checking for missing or inaccurate citations, dubious claims or non-concise text, it will rat itself out and can be told to fix what it found.

      This is the same LLM entirely...

      nor4@chaos.socialN This user is from outside of this forum
      nor4@chaos.socialN This user is from outside of this forum
      nor4@chaos.social
      wrote sidst redigeret af
      #22

      @hopeless @pseudonym you are suggesting that you can just layer more shit onto the shit and after enough layers of shit it becomes not shit.

      1 Reply Last reply
      0
      • pseudonym@mastodon.onlineP pseudonym@mastodon.online

        If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

        That's a cognitively brutal task.

        Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

        I propose any productivity gains will be consumed by false negative review failures.

        dtwx@mastodon.socialD This user is from outside of this forum
        dtwx@mastodon.socialD This user is from outside of this forum
        dtwx@mastodon.social
        wrote sidst redigeret af
        #23

        @pseudonym also, when the senior retires, who replaces them?

        1 Reply Last reply
        0
        • pseudonym@mastodon.onlineP pseudonym@mastodon.online

          If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

          That's a cognitively brutal task.

          Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

          I propose any productivity gains will be consumed by false negative review failures.

          max@mas.lab4.appM This user is from outside of this forum
          max@mas.lab4.appM This user is from outside of this forum
          max@mas.lab4.app
          wrote sidst redigeret af
          #24

          @pseudonym This, %100. The Glass Cage by Nicholas Carr dives into this in depth with examples from aviation, and how full-automation of flight, makes it harder to recover from a disaster situation for pilots.

          1 Reply Last reply
          0
          • pseudonym@mastodon.onlineP pseudonym@mastodon.online

            If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

            That's a cognitively brutal task.

            Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

            I propose any productivity gains will be consumed by false negative review failures.

            deborahh@cosocial.caD This user is from outside of this forum
            deborahh@cosocial.caD This user is from outside of this forum
            deborahh@cosocial.ca
            wrote sidst redigeret af
            #25

            @pseudonym @mayintoronto … and: there will be no juniors to grow into seniors. 😨

            1 Reply Last reply
            0
            • pseudonym@mastodon.onlineP pseudonym@mastodon.online

              If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

              That's a cognitively brutal task.

              Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

              I propose any productivity gains will be consumed by false negative review failures.

              nuintari@mastodon.bsd.cafeN This user is from outside of this forum
              nuintari@mastodon.bsd.cafeN This user is from outside of this forum
              nuintari@mastodon.bsd.cafe
              wrote sidst redigeret af
              #26

              @pseudonym We are using AI inexactly the worst ways possible.

              Caveat: I am a never AI-er, due to the ethical issues surrounding how training data is gathered, the severe ecological and economic impacts, and the fact that deepfakes are objectively making the world a shittier place.

              But pretend for a second, none of those are a problem anymore. We are still using AI wrong. You don't have it produce a mountain of code and have a human review it. You still use humans to produce the code, and have AI help other humans to review it. AI isn't terribly good at writing code, but it has been shown to be effective at finding a few classes of bugs humans are typically very bad at finding.

              But that won't allow you to fire people and replace them with monkeys on typewriters, so it'll never happen.

              1 Reply Last reply
              0
              • jwcph@helvede.netJ jwcph@helvede.net shared this topic
              Svar
              • Svar som emne
              Login for at svare
              • Ældste til nyeste
              • Nyeste til ældste
              • Most Votes


              • Log ind

              • Har du ikke en konto? Tilmeld

              • Login or register to search.
              Powered by NodeBB Contributors
              Graciously hosted by data.coop
              • First post
                Last post
              0
              • Hjem
              • Seneste
              • Etiketter
              • Populære
              • Verden
              • Bruger
              • Grupper