Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
llm
26 Indlæg 24 Posters 78 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • pseudonym@mastodon.onlineP pseudonym@mastodon.online

    If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

    That's a cognitively brutal task.

    Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

    I propose any productivity gains will be consumed by false negative review failures.

    koen_hufkens@mastodon.socialK This user is from outside of this forum
    koen_hufkens@mastodon.socialK This user is from outside of this forum
    koen_hufkens@mastodon.social
    wrote sidst redigeret af
    #10

    @pseudonym Amen to that. I don't even trust myself using one for this exact reason. At 10x the speed you will zip by your own mistakes.

    1 Reply Last reply
    0
    • pseudonym@mastodon.onlineP pseudonym@mastodon.online

      If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

      That's a cognitively brutal task.

      Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

      I propose any productivity gains will be consumed by false negative review failures.

      shanecelis@mastodon.gamedev.placeS This user is from outside of this forum
      shanecelis@mastodon.gamedev.placeS This user is from outside of this forum
      shanecelis@mastodon.gamedev.place
      wrote sidst redigeret af
      #11

      @pseudonym TIRED: 10x developer

      HIRED: 10x junior intern

      ALSO TIRED: Senior developer reviewing junior's copious output.

      1 Reply Last reply
      0
      • pseudonym@mastodon.onlineP pseudonym@mastodon.online

        If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

        That's a cognitively brutal task.

        Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

        I propose any productivity gains will be consumed by false negative review failures.

        tristan@sns.tcl.meT This user is from outside of this forum
        tristan@sns.tcl.meT This user is from outside of this forum
        tristan@sns.tcl.me
        wrote sidst redigeret af
        #12

        @pseudonym Recent Microsoft update releases seem to be a great case study for that

        1 Reply Last reply
        0
        • pseudonym@mastodon.onlineP pseudonym@mastodon.online

          If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

          That's a cognitively brutal task.

          Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

          I propose any productivity gains will be consumed by false negative review failures.

          moink@fedi.splitbrain.orgM This user is from outside of this forum
          moink@fedi.splitbrain.orgM This user is from outside of this forum
          moink@fedi.splitbrain.org
          wrote sidst redigeret af
          #13

          @pseudonym That and LLM code often looks very nice on the surface so it takes a lot of vigilance and thinking to find the subtle errors. Code from juniors tends to have more immediate signs of errors or wrong mental models.

          wronglang@bayes.clubW 1 Reply Last reply
          0
          • xrisk@social.treehouse.systemsX xrisk@social.treehouse.systems

            @pseudonym is the problem the increased volume of code that the LLM is producing (as compared to the junior dev) — what you are calling “productivity gains"? because I can see this same argument being made for code produced by humans as well.

            madhu_shrieks@mastinsaan.inM This user is from outside of this forum
            madhu_shrieks@mastinsaan.inM This user is from outside of this forum
            madhu_shrieks@mastinsaan.in
            wrote sidst redigeret af
            #14

            @xrisk @mehluv might be able to provide more insight on this, but at least when I was writing content and AI was getting integrated into our work, the expectation was to review high volume of written content much faster for our editors. And we fully made many fuck ups due to that, because it is overwhelming. I assume this might also be the case, but I might be fully wrong. It is not just that the amount of code written is high volume, but also the expected pace of reviewing also is accelerated. Because what is the point of automating stuff if the reviewing process neutralizes the gains?

            1 Reply Last reply
            0
            • xrisk@social.treehouse.systemsX xrisk@social.treehouse.systems

              @pseudonym is the problem the increased volume of code that the LLM is producing (as compared to the junior dev) — what you are calling “productivity gains"? because I can see this same argument being made for code produced by humans as well.

              malstrom@metalhead.clubM This user is from outside of this forum
              malstrom@metalhead.clubM This user is from outside of this forum
              malstrom@metalhead.club
              wrote sidst redigeret af
              #15

              @xrisk @pseudonym Volume is a key factor here. But even if the volume was the same, LLMs are doomed to stagnate as devs—whose code was scraped for training data—are displaced.

              xrisk@social.treehouse.systemsX 1 Reply Last reply
              0
              • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                That's a cognitively brutal task.

                Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                I propose any productivity gains will be consumed by false negative review failures.

                ada@beige.partyA This user is from outside of this forum
                ada@beige.partyA This user is from outside of this forum
                ada@beige.party
                wrote sidst redigeret af
                #16

                @pseudonym That is why they don't replace juniors in aviation, nuclear, and radiology - only in non-critical industry.

                If the cost of potential failure times the estimated failing rate is smaller than the total labour cost of screening, interviewing, training juniors, plus firing cultural misfits - then business replaces it.

                Not only it saves HR operating cost and internal training cost - they can also hang a mistake on a senior reviewer.

                And the review model has a positive productivity projectile as they have a stable improvement curve, unlike human.

                1 Reply Last reply
                0
                • malstrom@metalhead.clubM malstrom@metalhead.club

                  @xrisk @pseudonym Volume is a key factor here. But even if the volume was the same, LLMs are doomed to stagnate as devs—whose code was scraped for training data—are displaced.

                  xrisk@social.treehouse.systemsX This user is from outside of this forum
                  xrisk@social.treehouse.systemsX This user is from outside of this forum
                  xrisk@social.treehouse.systems
                  wrote sidst redigeret af
                  #17

                  @malstrom @pseudonym that’s an interesting claim. I don’t know enough about LLM research to make a judgement. I do know that LLMs trained on synthetic (other LLM-generated) data tend to perform worse, but have we reached the limits of what LLMs are capable of? In my limited understanding, if an LLM can “learn” fundamental programming “concepts” (the same way they can “learn” concepts across human languages — I could be wrong in my understanding here), they should (might?) be able to transfer/apply those concepts to not-before-seen domains (maybe with a bit of “reasoning” prodded in).

                  wronglang@bayes.clubW 1 Reply Last reply
                  0
                  • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                    If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                    That's a cognitively brutal task.

                    Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                    I propose any productivity gains will be consumed by false negative review failures.

                    moutmout@framapiaf.orgM This user is from outside of this forum
                    moutmout@framapiaf.orgM This user is from outside of this forum
                    moutmout@framapiaf.org
                    wrote sidst redigeret af
                    #18

                    @pseudonym This.

                    I do a lot of "computer science labs", where students learn to write code, and they wave me down when they have questions. When their code doesn't do what they expect, it's often easy to figure out what went wrong because you can spot a bit of code that looks funky. And usually, the problem is in those few lines.

                    LLM code is meant to look like good code, so you don't get these little shortcuts.

                    1 Reply Last reply
                    0
                    • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                      If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                      That's a cognitively brutal task.

                      Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                      I propose any productivity gains will be consumed by false negative review failures.

                      toldtheworld@mastodon.socialT This user is from outside of this forum
                      toldtheworld@mastodon.socialT This user is from outside of this forum
                      toldtheworld@mastodon.social
                      wrote sidst redigeret af
                      #19

                      @pseudonym I have posed this conundrum before and the answer I received is that there is also an opportunity cost to not moving faster and the risk of a catastrophic bug may not outweigh the risk of being overtaken by competitors, especially since that was already happening before LLMs anyway.

                      Also, it *seems* models are improving at detecting these bugs, so they are being used to review changes, which, for the reasons you point out, they might be better at than people.

                      1 Reply Last reply
                      0
                      • xrisk@social.treehouse.systemsX xrisk@social.treehouse.systems

                        @malstrom @pseudonym that’s an interesting claim. I don’t know enough about LLM research to make a judgement. I do know that LLMs trained on synthetic (other LLM-generated) data tend to perform worse, but have we reached the limits of what LLMs are capable of? In my limited understanding, if an LLM can “learn” fundamental programming “concepts” (the same way they can “learn” concepts across human languages — I could be wrong in my understanding here), they should (might?) be able to transfer/apply those concepts to not-before-seen domains (maybe with a bit of “reasoning” prodded in).

                        wronglang@bayes.clubW This user is from outside of this forum
                        wronglang@bayes.clubW This user is from outside of this forum
                        wronglang@bayes.club
                        wrote sidst redigeret af
                        #20

                        @xrisk @malstrom @pseudonym just for clarity, LLMs don't learn concepts

                        1 Reply Last reply
                        0
                        • moink@fedi.splitbrain.orgM moink@fedi.splitbrain.org

                          @pseudonym That and LLM code often looks very nice on the surface so it takes a lot of vigilance and thinking to find the subtle errors. Code from juniors tends to have more immediate signs of errors or wrong mental models.

                          wronglang@bayes.clubW This user is from outside of this forum
                          wronglang@bayes.clubW This user is from outside of this forum
                          wronglang@bayes.club
                          wrote sidst redigeret af
                          #21

                          @moink @pseudonym one of the benefits of people *having* a mental model

                          1 Reply Last reply
                          0
                          • hopeless@mas.toH hopeless@mas.to

                            @pseudonym It's certainly like that.

                            FWIW though LLMs don't have any shame or feeling they need to manage their reputation.

                            If you tell the same LLM that produced the report that it is now the QA manager and it must review the report from the standpoints of checking for missing or inaccurate citations, dubious claims or non-concise text, it will rat itself out and can be told to fix what it found.

                            This is the same LLM entirely...

                            nor4@chaos.socialN This user is from outside of this forum
                            nor4@chaos.socialN This user is from outside of this forum
                            nor4@chaos.social
                            wrote sidst redigeret af
                            #22

                            @hopeless @pseudonym you are suggesting that you can just layer more shit onto the shit and after enough layers of shit it becomes not shit.

                            1 Reply Last reply
                            0
                            • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                              If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                              That's a cognitively brutal task.

                              Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                              I propose any productivity gains will be consumed by false negative review failures.

                              dtwx@mastodon.socialD This user is from outside of this forum
                              dtwx@mastodon.socialD This user is from outside of this forum
                              dtwx@mastodon.social
                              wrote sidst redigeret af
                              #23

                              @pseudonym also, when the senior retires, who replaces them?

                              1 Reply Last reply
                              0
                              • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                                If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                That's a cognitively brutal task.

                                Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                I propose any productivity gains will be consumed by false negative review failures.

                                max@mas.lab4.appM This user is from outside of this forum
                                max@mas.lab4.appM This user is from outside of this forum
                                max@mas.lab4.app
                                wrote sidst redigeret af
                                #24

                                @pseudonym This, %100. The Glass Cage by Nicholas Carr dives into this in depth with examples from aviation, and how full-automation of flight, makes it harder to recover from a disaster situation for pilots.

                                1 Reply Last reply
                                0
                                • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                                  If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                  That's a cognitively brutal task.

                                  Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                  I propose any productivity gains will be consumed by false negative review failures.

                                  deborahh@cosocial.caD This user is from outside of this forum
                                  deborahh@cosocial.caD This user is from outside of this forum
                                  deborahh@cosocial.ca
                                  wrote sidst redigeret af
                                  #25

                                  @pseudonym @mayintoronto … and: there will be no juniors to grow into seniors. 😨

                                  1 Reply Last reply
                                  0
                                  • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                                    If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                    That's a cognitively brutal task.

                                    Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                    I propose any productivity gains will be consumed by false negative review failures.

                                    nuintari@mastodon.bsd.cafeN This user is from outside of this forum
                                    nuintari@mastodon.bsd.cafeN This user is from outside of this forum
                                    nuintari@mastodon.bsd.cafe
                                    wrote sidst redigeret af
                                    #26

                                    @pseudonym We are using AI inexactly the worst ways possible.

                                    Caveat: I am a never AI-er, due to the ethical issues surrounding how training data is gathered, the severe ecological and economic impacts, and the fact that deepfakes are objectively making the world a shittier place.

                                    But pretend for a second, none of those are a problem anymore. We are still using AI wrong. You don't have it produce a mountain of code and have a human review it. You still use humans to produce the code, and have AI help other humans to review it. AI isn't terribly good at writing code, but it has been shown to be effective at finding a few classes of bugs humans are typically very bad at finding.

                                    But that won't allow you to fire people and replace them with monkeys on typewriters, so it'll never happen.

                                    1 Reply Last reply
                                    0
                                    • jwcph@helvede.netJ jwcph@helvede.net shared this topic
                                    Svar
                                    • Svar som emne
                                    Login for at svare
                                    • Ældste til nyeste
                                    • Nyeste til ældste
                                    • Most Votes


                                    • Log ind

                                    • Har du ikke en konto? Tilmeld

                                    • Login or register to search.
                                    Powered by NodeBB Contributors
                                    Graciously hosted by data.coop
                                    • First post
                                      Last post
                                    0
                                    • Hjem
                                    • Seneste
                                    • Etiketter
                                    • Populære
                                    • Verden
                                    • Bruger
                                    • Grupper