We'll see how I feel in the morning, but for now i seem to have convinced myself to actually read that fuckin anthropic paper
-
Chapter 3. Framework.
Finally.
Paraphrasing a little: "the learning by doing" philosphy connects completing real world tasks with learning new concepts and developing new skills. Experiental learning has also been explored to mimic solving real world problems. We focus on settings where workers must acquire new skills to complete tasks. We seek to understand both the impact of AI on productivity
and skill formation. We ask whether AI assistance presents a tradeoff between immediate productivity and longer-term skill development or if AI assistance presents a shortcut to enhance both.Right. There it is again: productivity. Even within this framing, there are at least 3 more possibilities. That AI does not actually increase productivity; that AI has no effect at all; or that AI improves learning only. I think it's very telling that the authors don't even conceive of these options. Particularly the last one.
But I'm becoming more and more convinced that the framing of productivity as an essential factor to measure and judge by is itself the whole purpose of this paper. And, specifically, productivity as defined by production output. But maybe I'm getting ahead of myself.
And now we have actual research questions! It feels like it shouldn't take this long to get these, but w/e
1. Does AI assistance improve task completion productivity when new skills are required?
2. How does using AI assistance affect the development of these new skills?We'll learn how the authors propose to answer these questions in the next chapter: Methods.
But first, there is a 6 year old in here demanding I play minecraft, and I'd rather do that.
To be continued... probbaly
-
@inthehands @jenniferplusplus One of my personal hesitance to use the LLM tools much (despite incredible professional pressure to do so) is that my use of it (again, under professional necessity) has re-enforced my pre-existing belief that struggling through a problem, debugging and digging through source and so on has been CRITICAL to my skill development. It is something I have for (uh) 15+ years told less experienced software developers is critical to getting better / faster!
@r343l @inthehands @jenniferplusplus
“struggling through a problem, debugging & digging through source & so on has been CRITICAL to my skill development” … because the “cognitive struggle” is like doing physical exercise or activity to get your body and brain better + faster doing it.
Making a request & waiting for the output result is like ordering a meal from a restaurant menu & somehow expecting that action to make you an expert Chef. At most, you become an expert at ordering off a menu. -
@jenniferplusplus "Learning with instruction is generally more effective than learning through struggle"
I'm not sure I agree! Desirable difficulties literature and metacognition lit both agree short term failures can lead to better long term retention (people's lack of belief in this is often pointed to as a reason we engage in inefficient problem solving). That is one reason project based learning can sometimes beat sage on a stage lectures
Eg classic lit here: https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/04/EBjork_RBjork_2011.pdf
@grimalkina I think I phrased that badly. I'm aware and agree that doing a thing, mistakes and all, is very often has better learning outcomes than lectures from experts.
What I meant was doing a thing with guidance and feedback from an expert has better outcomes than doing it in isolation.
-
@dalias Honestly, yes. I suspect the purpose of this paper is to reinforce that production is a correct and necessary factor to consider when making decisions about AI.
And secondarily, I suspect it's establishing justification for blaming workers for undesirable outcomes; it's our fault for choosing to learn badly.
@jenniferplusplus
The purpose of a paper is the assumptions it makes. -
> We find that using AI assistance to complete
tasks that involve this new library resulted in a reduction in the evaluation score by 17% or two grade
points (Cohen’s d = 0.738, p = 0.010). Meanwhile, we did not find a statistically significant acceleration in
completion time with AI assistance.I mean, that's an enormous effect. I'm very interested in the methods section, now.
> Through an in-depth qualitative analysis where we watch the screen recordings of every participant in our
main study, we explain the lack of AI productivity improvement through the additional time some participants
invested in interacting with the AI assistant....
Is this about learning, or is it about productivity!? God.
> We attribute the gains in skill development of the control group to the process of encountering and subsequently resolving errors independently
Hm. Learning with instruction is generally more effective than learning through struggle. A surface level read would suggest that the stochastic chatbot actually has a counter-instructional effect. But again, we'll see what the methods actually are.
Edit: I should say, doing things with feedback from an instructor generally has better learning outcomes than doing things in isolation. I phrased that badly.
@jenniferplusplus I think the 'control group' here didn't use AI at all. At least that's how I read it. And they completed the task in more or less the same time and two grades better results.
-
And now we have actual research questions! It feels like it shouldn't take this long to get these, but w/e
1. Does AI assistance improve task completion productivity when new skills are required?
2. How does using AI assistance affect the development of these new skills?We'll learn how the authors propose to answer these questions in the next chapter: Methods.
But first, there is a 6 year old in here demanding I play minecraft, and I'd rather do that.
To be continued... probbaly
There's a whole series of recent studies from MIT, CMU, Boston Consulting Group, BBC, and Oxford Economics arguing that AI/LLM assistants do NOT improve productivity.
Walk-through here:
-
@jenniferplusplus
The purpose of a paper is the assumptions it makes.@dalias Not all the time. But if it's research conducted and published by the in-house research team of Anthropic? Yeah, probably
-
@dalias Not all the time. But if it's research conducted and published by the in-house research team of Anthropic? Yeah, probably
@jenniferplusplus Yeah. Or if there are conflicts of interest in the funding, or if the researchers are just aspiring to getting hired into the industry or getting VC for their own ideas.
-
There's a whole series of recent studies from MIT, CMU, Boston Consulting Group, BBC, and Oxford Economics arguing that AI/LLM assistants do NOT improve productivity.
Walk-through here:
-
And now we have actual research questions! It feels like it shouldn't take this long to get these, but w/e
1. Does AI assistance improve task completion productivity when new skills are required?
2. How does using AI assistance affect the development of these new skills?We'll learn how the authors propose to answer these questions in the next chapter: Methods.
But first, there is a 6 year old in here demanding I play minecraft, and I'd rather do that.
To be continued... probbaly
Chapter 4. Methods.
Let's go
First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.
Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.
/sigh
I just. Come on. If you were serious about this, it would be pocket change to do an actual study
-
Chapter 4. Methods.
Let's go
First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.
Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.
/sigh
I just. Come on. If you were serious about this, it would be pocket change to do an actual study
@jenniferplusplus thank you so much for doing this. I skimmed and just couldn’t bring myself to read it all, and it’s nice to see someone doing a much deeper read but coming to largely the same conclusions.
-
@jenniferplusplus thank you so much for doing this. I skimmed and just couldn’t bring myself to read it all, and it’s nice to see someone doing a much deeper read but coming to largely the same conclusions.
@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
-
@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??
-
@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
@jenniferplusplus all the more reason I appreciate you putting the effort in!
-
@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??
@r343l @jenniferplusplus as I put it earlier today: https://mastodon.social/@glyph/115992279951399934
-
@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
@jenniferplusplus @glyph The industrial state today is a progressing milestone . But it has a history of 60 years. Turing test and Joseph Weizenbaum’s “Eliza” (same Test as Turing) are passed easily on any machine. But the myth of the ancient days about AI didnot change for many people.
-
@jenniferplusplus @glyph The industrial state today is a progressing milestone . But it has a history of 60 years. Turing test and Joseph Weizenbaum’s “Eliza” (same Test as Turing) are passed easily on any machine. But the myth of the ancient days about AI didnot change for many people.
@jenniferplusplus @glyph I , older too, compare AI often with the moon landing of the 1960ies when AI also started professionally at the MIT, USA. The most confusing inquiry about Apollo’s success was : Now that we reached this goal that millions dreamed about what do we want there ? And what is our next stepping stone ?
-
@jenniferplusplus @glyph I , older too, compare AI often with the moon landing of the 1960ies when AI also started professionally at the MIT, USA. The most confusing inquiry about Apollo’s success was : Now that we reached this goal that millions dreamed about what do we want there ? And what is our next stepping stone ?
@jenniferplusplus @glyph my beloved fantasy and SciFi book was and is Solaris from Stanislaw Lem (Poland, 1961)
Https://en.wikipedia.org/wiki/Solaris_%28novel%29a mystic ocean on a distant planet that materializes human minds life traumata . Astronauts there suffer from a deceased child or partner e.g by suicide.
The facit is that humanity tries to push their frontiers as much as possible to escape earth from daily routine. And only faces himself as in mind mirror. -
@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??
@r343l @glyph
As I've learned, they did some preregistration for the study. That might have influenced them.And, a whole bunch of these ai researchers really do seem to think of themselves as serious scientists doing important work. Particularly at anthropic, as that's where a lot of the true believers ended up
-
Chapter 4. Methods.
Let's go
First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.
Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.
/sigh
I just. Come on. If you were serious about this, it would be pocket change to do an actual study
Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.
But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.
1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.
Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.
