Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. We'll see how I feel in the morning, but for now i seem to have convinced myself to actually read that fuckin anthropic paper

We'll see how I feel in the morning, but for now i seem to have convinced myself to actually read that fuckin anthropic paper

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
92 Indlæg 29 Posters 13 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • enigma@norden.socialE enigma@norden.social

    @jenniferplusplus @glyph I , older too, compare AI often with the moon landing of the 1960ies when AI also started professionally at the MIT, USA. The most confusing inquiry about Apollo’s success was : Now that we reached this goal that millions dreamed about what do we want there ? And what is our next stepping stone ?

    enigma@norden.socialE This user is from outside of this forum
    enigma@norden.socialE This user is from outside of this forum
    enigma@norden.social
    wrote sidst redigeret af
    #59

    @jenniferplusplus @glyph my beloved fantasy and SciFi book was and is Solaris from Stanislaw Lem (Poland, 1961)
    Https://en.wikipedia.org/wiki/Solaris_%28novel%29

    a mystic ocean on a distant planet that materializes human minds life traumata . Astronauts there suffer from a deceased child or partner e.g by suicide.
    The facit is that humanity tries to push their frontiers as much as possible to escape earth from daily routine. And only faces himself as in mind mirror.

    1 Reply Last reply
    0
    • r343l@freeradical.zoneR r343l@freeradical.zone

      @jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??

      jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
      jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
      jenniferplusplus@hachyderm.io
      wrote sidst redigeret af
      #60

      @r343l @glyph
      As I've learned, they did some preregistration for the study. That might have influenced them.

      And, a whole bunch of these ai researchers really do seem to think of themselves as serious scientists doing important work. Particularly at anthropic, as that's where a lot of the true believers ended up

      1 Reply Last reply
      0
      • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

        Chapter 4. Methods.

        Let's go

        First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.

        Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.

        /sigh

        I just. Come on. If you were serious about this, it would be pocket change to do an actual study

        jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
        jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
        jenniferplusplus@hachyderm.io
        wrote sidst redigeret af
        #61

        Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.

        But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.

        1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.

        Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.

        jenniferplusplus@hachyderm.ioJ sci_photos@troet.cafeS 2 Replies Last reply
        0
        • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

          Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.

          But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.

          1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.

          Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.

          jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
          jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
          jenniferplusplus@hachyderm.io
          wrote sidst redigeret af
          #62

          2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."

          Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.

          jonny@neuromatch.socialJ jenniferplusplus@hachyderm.ioJ sci_photos@troet.cafeS 3 Replies Last reply
          0
          • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

            2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."

            Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.

            jonny@neuromatch.socialJ This user is from outside of this forum
            jonny@neuromatch.socialJ This user is from outside of this forum
            jonny@neuromatch.social
            wrote sidst redigeret af
            #63

            @jenniferplusplus the latter part is especially true and i don't have any sort of strategy for handling it. i have to read every single line of LLM code because the space of possible mistakes it can make is so large. with humans, even if someone really doesn't know what they are doing, there are only so many kinds of things that could conceivably screw up.

            1 Reply Last reply
            0
            • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

              2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."

              Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.

              jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
              jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
              jenniferplusplus@hachyderm.io
              wrote sidst redigeret af
              #64

              3. Code writing. Honestly, I don't get the impression they even understand what this means. They say "Low-level code writing, like remembering the syntax of functions, will be less important with further integration of AI coding tools
              than high-level system design."

              Neither of those things is a meaningful facet of actually writing code. Writing code exists entirely in-between those two things. Code completion tools basically eliminate having to think about syntax (but we will return to this). And system design happens in the realm of abstract behaviors and responsibilities.

              jenniferplusplus@hachyderm.ioJ jsbarretto@social.coopJ 2 Replies Last reply
              0
              • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                3. Code writing. Honestly, I don't get the impression they even understand what this means. They say "Low-level code writing, like remembering the syntax of functions, will be less important with further integration of AI coding tools
                than high-level system design."

                Neither of those things is a meaningful facet of actually writing code. Writing code exists entirely in-between those two things. Code completion tools basically eliminate having to think about syntax (but we will return to this). And system design happens in the realm of abstract behaviors and responsibilities.

                jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                jenniferplusplus@hachyderm.io
                wrote sidst redigeret af
                #65

                4. Conceptual. As they put it, "Conceptual understanding is critical to assess whether AI-generated code uses appropriate design patterns that adheres to how the library should be used.

                IIIIIII guess. That's not wrong, exactly? But it's such a reverse centaur world view. I don't want to be the conceptual bounds checker for the code extruder. And I don't understand why they don't understand that.

                mattly@hachyderm.ioM jenniferplusplus@hachyderm.ioJ 2 Replies Last reply
                0
                • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                  4. Conceptual. As they put it, "Conceptual understanding is critical to assess whether AI-generated code uses appropriate design patterns that adheres to how the library should be used.

                  IIIIIII guess. That's not wrong, exactly? But it's such a reverse centaur world view. I don't want to be the conceptual bounds checker for the code extruder. And I don't understand why they don't understand that.

                  mattly@hachyderm.ioM This user is from outside of this forum
                  mattly@hachyderm.ioM This user is from outside of this forum
                  mattly@hachyderm.io
                  wrote sidst redigeret af
                  #66

                  @jenniferplusplus they don't understand it because their job depends on them not understanding it

                  jenniferplusplus@hachyderm.ioJ 1 Reply Last reply
                  0
                  • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                    4. Conceptual. As they put it, "Conceptual understanding is critical to assess whether AI-generated code uses appropriate design patterns that adheres to how the library should be used.

                    IIIIIII guess. That's not wrong, exactly? But it's such a reverse centaur world view. I don't want to be the conceptual bounds checker for the code extruder. And I don't understand why they don't understand that.

                    jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                    jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                    jenniferplusplus@hachyderm.io
                    wrote sidst redigeret af
                    #67

                    So anyway, all of this is, apparently, in service to the "original motivation of developing and retaining the skills required for supervising automation."

                    Which would be cool, I'd like to read that study, because it isn't this one. This study is about whether the tools used to rapidly spit out meaningless code will impact one's ability to answer questions about the code that was spat. And even then, I'm not sure the design of the study can answer that question.

                    hrefna@hachyderm.ioH jenniferplusplus@hachyderm.ioJ 2 Replies Last reply
                    0
                    • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                      So, back to the paper.

                      "How AI Impacts Skill Formation"
                      https://arxiv.org/abs/2601.20245

                      The very first sentence of the abstract:

                      > AI assistance produces significant productivity gains across professional domains, particularly for novice workers.

                      1. The evidence for this is mixed, and the effect is small.
                      2. That's not even the purpose of this study. The design of the study doesn't support drawing conclusions in this area.

                      Of course, the authors will repeat this claim frequently. Which brings us back to MY priors, which is that this is largely a political document.

                      hrefna@hachyderm.ioH This user is from outside of this forum
                      hrefna@hachyderm.ioH This user is from outside of this forum
                      hrefna@hachyderm.io
                      wrote sidst redigeret af
                      #68

                      @jenniferplusplus oh gods I need to read this.

                      jenniferplusplus@hachyderm.ioJ 1 Reply Last reply
                      0
                      • mattly@hachyderm.ioM mattly@hachyderm.io

                        @jenniferplusplus they don't understand it because their job depends on them not understanding it

                        jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                        jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                        jenniferplusplus@hachyderm.io
                        wrote sidst redigeret af
                        #69

                        @mattly I mean, yes. But still.

                        Maybe what I don't understand is why everyone else goes along with it.

                        mattly@hachyderm.ioM 1 Reply Last reply
                        0
                        • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                          3. Code writing. Honestly, I don't get the impression they even understand what this means. They say "Low-level code writing, like remembering the syntax of functions, will be less important with further integration of AI coding tools
                          than high-level system design."

                          Neither of those things is a meaningful facet of actually writing code. Writing code exists entirely in-between those two things. Code completion tools basically eliminate having to think about syntax (but we will return to this). And system design happens in the realm of abstract behaviors and responsibilities.

                          jsbarretto@social.coopJ This user is from outside of this forum
                          jsbarretto@social.coopJ This user is from outside of this forum
                          jsbarretto@social.coop
                          wrote sidst redigeret af
                          #70

                          @jenniferplusplus Kind of a funny statement given that the whole point of abstraction, encapsulation, high level languages, etc. is to provide a formal basis for much of a program to be designed in terms of high level concepts

                          jenniferplusplus@hachyderm.ioJ 1 Reply Last reply
                          0
                          • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                            So anyway, all of this is, apparently, in service to the "original motivation of developing and retaining the skills required for supervising automation."

                            Which would be cool, I'd like to read that study, because it isn't this one. This study is about whether the tools used to rapidly spit out meaningless code will impact one's ability to answer questions about the code that was spat. And even then, I'm not sure the design of the study can answer that question.

                            hrefna@hachyderm.ioH This user is from outside of this forum
                            hrefna@hachyderm.ioH This user is from outside of this forum
                            hrefna@hachyderm.io
                            wrote sidst redigeret af
                            #71

                            @jenniferplusplus That paper is _extremely damning_ of the use of AI for all that it bends over backwards and ties itself into knots to try to find some way of making it seem less catastrophically bad.

                            jenniferplusplus@hachyderm.ioJ 1 Reply Last reply
                            0
                            • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                              @mattly I mean, yes. But still.

                              Maybe what I don't understand is why everyone else goes along with it.

                              mattly@hachyderm.ioM This user is from outside of this forum
                              mattly@hachyderm.ioM This user is from outside of this forum
                              mattly@hachyderm.io
                              wrote sidst redigeret af
                              #72

                              @jenniferplusplus I was talking with a friend recently about their workplace's new mandate for using tokens,

                              and like, it's not “let's talk about this reasonably and decide what the best course of action is"

                              it's “get in losers, we're going to sloptown" and if you don't fall in line you're going to lose your job

                              and probably also some of the chatbot psychosis that kicks in with people who for all other metrics strike me as kratom addicts. They justify what they need to

                              1 Reply Last reply
                              0
                              • hrefna@hachyderm.ioH hrefna@hachyderm.io

                                @jenniferplusplus That paper is _extremely damning_ of the use of AI for all that it bends over backwards and ties itself into knots to try to find some way of making it seem less catastrophically bad.

                                jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                                jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                                jenniferplusplus@hachyderm.io
                                wrote sidst redigeret af
                                #73

                                @hrefna it certainly doesn't make them look good. But I'm honestly not sure we can draw *any* conclusion from this study. Which I'm getting into now

                                1 Reply Last reply
                                0
                                • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                                  > We find that using AI assistance to complete
                                  tasks that involve this new library resulted in a reduction in the evaluation score by 17% or two grade
                                  points (Cohen’s d = 0.738, p = 0.010). Meanwhile, we did not find a statistically significant acceleration in
                                  completion time with AI assistance.

                                  I mean, that's an enormous effect. I'm very interested in the methods section, now.

                                  > Through an in-depth qualitative analysis where we watch the screen recordings of every participant in our
                                  main study, we explain the lack of AI productivity improvement through the additional time some participants
                                  invested in interacting with the AI assistant.

                                  ...

                                  Is this about learning, or is it about productivity!? God.

                                  > We attribute the gains in skill development of the control group to the process of encountering and subsequently resolving errors independently

                                  Hm. Learning with instruction is generally more effective than learning through struggle. A surface level read would suggest that the stochastic chatbot actually has a counter-instructional effect. But again, we'll see what the methods actually are.

                                  Edit: I should say, doing things with feedback from an instructor generally has better learning outcomes than doing things in isolation. I phrased that badly.

                                  realn2s@infosec.exchangeR This user is from outside of this forum
                                  realn2s@infosec.exchangeR This user is from outside of this forum
                                  realn2s@infosec.exchange
                                  wrote sidst redigeret af
                                  #74

                                  @jenniferplusplus
                                  In a bit confused

                                  Aren't lower grades worse?
                                  And it even took longer because of "AI distractions"?

                                  jenniferplusplus@hachyderm.ioJ 1 Reply Last reply
                                  0
                                  • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                                    Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.

                                    But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.

                                    1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.

                                    Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.

                                    sci_photos@troet.cafeS This user is from outside of this forum
                                    sci_photos@troet.cafeS This user is from outside of this forum
                                    sci_photos@troet.cafe
                                    wrote sidst redigeret af
                                    #75

                                    @jenniferplusplus 🙄

                                    1 Reply Last reply
                                    0
                                    • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                                      So anyway, all of this is, apparently, in service to the "original motivation of developing and retaining the skills required for supervising automation."

                                      Which would be cool, I'd like to read that study, because it isn't this one. This study is about whether the tools used to rapidly spit out meaningless code will impact one's ability to answer questions about the code that was spat. And even then, I'm not sure the design of the study can answer that question.

                                      jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                                      jenniferplusplus@hachyderm.ioJ This user is from outside of this forum
                                      jenniferplusplus@hachyderm.io
                                      wrote sidst redigeret af
                                      #76

                                      I guess this brings me to the study design. I'm struggling a little to figure out how to talk about this. The short version is that I don't think they're testing any of the effects they think they're testing.

                                      So, they start with a warmup coding round, which seems to be mostly to let people become familiar with the tool. That's important, because the tool is commercial software for conducting coding interviews in a browser. They don't say which one, that I've seen.

                                      Then they have two separate toy projects that the subjects should complete. 1 is a non-blocking ticker, using a specific async library. 2 is some async I/O record retrieval with basic error handling, using the same async library.

                                      And then they take a quiz about that async library.

                                      But there's some very important details. The coding portion and quiz are both timed. The subjects were instructed to complete them as fast as possible. And the testing platform did not seem to have code completion or, presumably, any other modern development affordance.

                                      jenniferplusplus@hachyderm.ioJ 1 Reply Last reply
                                      0
                                      • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                                        2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."

                                        Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.

                                        sci_photos@troet.cafeS This user is from outside of this forum
                                        sci_photos@troet.cafeS This user is from outside of this forum
                                        sci_photos@troet.cafe
                                        wrote sidst redigeret af
                                        #77

                                        @jenniferplusplus I agree; LLM-generated code (above a certain threshold of complexity) is like compiled C code with -O2 turned on. Hard to read, very hard to understand.
                                        Code can get “compressed” quite a lot.

                                        1 Reply Last reply
                                        0
                                        • jenniferplusplus@hachyderm.ioJ jenniferplusplus@hachyderm.io

                                          Chapter 4. Methods.

                                          Let's go

                                          First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.

                                          Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.

                                          /sigh

                                          I just. Come on. If you were serious about this, it would be pocket change to do an actual study

                                          sci_photos@troet.cafeS This user is from outside of this forum
                                          sci_photos@troet.cafeS This user is from outside of this forum
                                          sci_photos@troet.cafe
                                          wrote sidst redigeret af
                                          #78

                                          @jenniferplusplus Oh.
                                          I was more thinking of a two week hackathon setting with multiple teams, lots of 🍕, and an evaluation of all different phases like
                                          * planning (choosing right library, based on LLM-“discussions”),
                                          * tests + implementations,
                                          * searching bugs,
                                          * adapting to spontaneous “changes” by the customer,
                                          * readability / maintainability by other teams.

                                          But … this … 🙄

                                          1 Reply Last reply
                                          0
                                          Svar
                                          • Svar som emne
                                          Login for at svare
                                          • Ældste til nyeste
                                          • Nyeste til ældste
                                          • Most Votes


                                          • Log ind

                                          • Har du ikke en konto? Tilmeld

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          Graciously hosted by data.coop
                                          • First post
                                            Last post
                                          0
                                          • Hjem
                                          • Seneste
                                          • Etiketter
                                          • Populære
                                          • Verden
                                          • Bruger
                                          • Grupper