FARVEL BIG TECH

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

We'll see how I feel in the morning, but for now i seem to have convinced myself to actually read that fuckin anthropic paper

Ikke-kategoriseret

92 Indlæg 29 Posters 13 Visninger

J jenniferplusplus@hachyderm.io

> We find that using AI assistance to complete
tasks that involve this new library resulted in a reduction in the evaluation score by 17% or two grade
points (Cohen’s d = 0.738, p = 0.010). Meanwhile, we did not find a statistically significant acceleration in
completion time with AI assistance.
I mean, that's an enormous effect. I'm very interested in the methods section, now.
> Through an in-depth qualitative analysis where we watch the screen recordings of every participant in our
main study, we explain the lack of AI productivity improvement through the additional time some participants
invested in interacting with the AI assistant.
...
Is this about learning, or is it about productivity!? God.
> We attribute the gains in skill development of the control group to the process of encountering and subsequently resolving errors independently
Hm. Learning with instruction is generally more effective than learning through struggle. A surface level read would suggest that the stochastic chatbot actually has a counter-instructional effect. But again, we'll see what the methods actually are.
Edit: I should say, doing things with feedback from an instructor generally has better learning outcomes than doing things in isolation. I phrased that badly.
C This user is from outside of this forum
C This user is from outside of this forum
catch56@kolektiva.social

wrote sidst redigeret af

#46

@jenniferplusplus I think the 'control group' here didn't use AI at all. At least that's how I read it. And they completed the task in more or less the same time and two grades better results.
1 Reply Last reply

0
J jenniferplusplus@hachyderm.io

And now we have actual research questions! It feels like it shouldn't take this long to get these, but w/e
1. Does AI assistance improve task completion productivity when new skills are required?
2. How does using AI assistance affect the development of these new skills?
We'll learn how the authors propose to answer these questions in the next chapter: Methods.
But first, there is a 6 year old in here demanding I play minecraft, and I'd rather do that.
To be continued... probbaly
W This user is from outside of this forum
W This user is from outside of this forum
weekend_editor@mathstodon.xyz

wrote sidst redigeret af

#47

@jenniferplusplus
There's a whole series of recent studies from MIT, CMU, Boston Consulting Group, BBC, and Oxford Economics arguing that AI/LLM assistants do NOT improve productivity.
Walk-through here:
https://www.someweekendreading.blog/ai-update-2026/
J 1 Reply Last reply

0
D dalias@hachyderm.io

@jenniferplusplus The purpose of a paper is the assumptions it makes.
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#48

@dalias Not all the time. But if it's research conducted and published by the in-house research team of Anthropic? Yeah, probably
D 1 Reply Last reply

0
J jenniferplusplus@hachyderm.io

@dalias Not all the time. But if it's research conducted and published by the in-house research team of Anthropic? Yeah, probably
D This user is from outside of this forum
D This user is from outside of this forum
dalias@hachyderm.io

wrote sidst redigeret af

#49

@jenniferplusplus Yeah. Or if there are conflicts of interest in the funding, or if the researchers are just aspiring to getting hired into the industry or getting VC for their own ideas.
1 Reply Last reply

0
W weekend_editor@mathstodon.xyz

@jenniferplusplus
There's a whole series of recent studies from MIT, CMU, Boston Consulting Group, BBC, and Oxford Economics arguing that AI/LLM assistants do NOT improve productivity.
Walk-through here:
https://www.someweekendreading.blog/ai-update-2026/
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#50

@weekend_editor
1 Reply Last reply

0
J jenniferplusplus@hachyderm.io

And now we have actual research questions! It feels like it shouldn't take this long to get these, but w/e
1. Does AI assistance improve task completion productivity when new skills are required?
2. How does using AI assistance affect the development of these new skills?
We'll learn how the authors propose to answer these questions in the next chapter: Methods.
But first, there is a 6 year old in here demanding I play minecraft, and I'd rather do that.
To be continued... probbaly
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#51

Chapter 4. Methods.
Let's go
First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.
Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.
/sigh
I just. Come on. If you were serious about this, it would be pocket change to do an actual study
G J S 3 Replies Last reply

0
J jenniferplusplus@hachyderm.io

Chapter 4. Methods.
Let's go
First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.
Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.
/sigh
I just. Come on. If you were serious about this, it would be pocket change to do an actual study
G This user is from outside of this forum
G This user is from outside of this forum
glyph@mastodon.social

wrote sidst redigeret af

#52

@jenniferplusplus thank you so much for doing this. I skimmed and just couldn’t bring myself to read it all, and it’s nice to see someone doing a much deeper read but coming to largely the same conclusions.
J 1 Reply Last reply

0
G glyph@mastodon.social

@jenniferplusplus thank you so much for doing this. I skimmed and just couldn’t bring myself to read it all, and it’s nice to see someone doing a much deeper read but coming to largely the same conclusions.
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#53

@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
R G E 3 Replies Last reply

0
J jenniferplusplus@hachyderm.io

@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
R This user is from outside of this forum
R This user is from outside of this forum
r343l@freeradical.zone

wrote sidst redigeret af

#54

@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??
G J 2 Replies Last reply

0
J jenniferplusplus@hachyderm.io

@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
G This user is from outside of this forum
G This user is from outside of this forum
glyph@mastodon.social

wrote sidst redigeret af

#55

@jenniferplusplus all the more reason I appreciate you putting the effort in!
1 Reply Last reply

0
R r343l@freeradical.zone

@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??
G This user is from outside of this forum
G This user is from outside of this forum
glyph@mastodon.social

wrote sidst redigeret af

#56

@r343l @jenniferplusplus as I put it earlier today: https://mastodon.social/@glyph/115992279951399934
1 Reply Last reply

0
J jenniferplusplus@hachyderm.io

@glyph i would do this more, but the format of academic papers is so cumbersome. The time I actually have available for it is on the couch, after the kid's in bed. But reading these things on a phone is basically impossible
E This user is from outside of this forum
E This user is from outside of this forum
enigma@norden.social

wrote sidst redigeret af

#57

@jenniferplusplus @glyph The industrial state today is a progressing milestone . But it has a history of 60 years. Turing test and Joseph Weizenbaum’s “Eliza” (same Test as Turing) are passed easily on any machine. But the myth of the ancient days about AI didnot change for many people.
E 1 Reply Last reply

0
E enigma@norden.social

@jenniferplusplus @glyph The industrial state today is a progressing milestone . But it has a history of 60 years. Turing test and Joseph Weizenbaum’s “Eliza” (same Test as Turing) are passed easily on any machine. But the myth of the ancient days about AI didnot change for many people.
E This user is from outside of this forum
E This user is from outside of this forum
enigma@norden.social

wrote sidst redigeret af

#58

@jenniferplusplus @glyph I , older too, compare AI often with the moon landing of the 1960ies when AI also started professionally at the MIT, USA. The most confusing inquiry about Apollo’s success was : Now that we reached this goal that millions dreamed about what do we want there ? And what is our next stepping stone ?
E 1 Reply Last reply

0
E enigma@norden.social

@jenniferplusplus @glyph I , older too, compare AI often with the moon landing of the 1960ies when AI also started professionally at the MIT, USA. The most confusing inquiry about Apollo’s success was : Now that we reached this goal that millions dreamed about what do we want there ? And what is our next stepping stone ?
E This user is from outside of this forum
E This user is from outside of this forum
enigma@norden.social

wrote sidst redigeret af

#59

@jenniferplusplus @glyph my beloved fantasy and SciFi book was and is Solaris from Stanislaw Lem (Poland, 1961)
Https://en.wikipedia.org/wiki/Solaris_%28novel%29
a mystic ocean on a distant planet that materializes human minds life traumata . Astronauts there suffer from a deceased child or partner e.g by suicide.
The facit is that humanity tries to push their frontiers as much as possible to escape earth from daily routine. And only faces himself as in mind mirror.
1 Reply Last reply

0
R r343l@freeradical.zone

@jenniferplusplus @glyph I had only read the anthropic summary. I was struck by how even if all their methods and study design were great (& a good sample etc) the results seemed to very much indicate LLM use isn't as transformative as the hype with major risks of deskilling impacts. I was surprised they published it just reading their own summary. I guess they had to make lemonade from lemons??
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#60

@r343l @glyph
As I've learned, they did some preregistration for the study. That might have influenced them.
And, a whole bunch of these ai researchers really do seem to think of themselves as serious scientists doing important work. Particularly at anthropic, as that's where a lot of the true believers ended up
1 Reply Last reply

0
J jenniferplusplus@hachyderm.io

Chapter 4. Methods.
Let's go
First, the task. It's uh. It's basically a shitty whiteboard coding interview. The assignment is to build a couple of demo projects for an async python library. One is a non-blocking ticker. The other is some I/O ("record retrieval", not clear if this is the local filesystem or what, but probably the local fs) with handling for missing files.
Both are implemented in a literal white board coding interview tool. The test group gets an AI chatbot button, and encouragement to use it. The control group doesn't.
/sigh
I just. Come on. If you were serious about this, it would be pocket change to do an actual study
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#61

Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.
But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.
1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.
Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.
J S 2 Replies Last reply

0
J jenniferplusplus@hachyderm.io

Found it! n=52. wtf. I reiterate: 20 billion dollars, just for this current funding round, and they only managed to do this study with 52 people.
But anyway, let's return to the methods themselves. They start with the design of the evaluation component, so I will too. It's organized around 4 evaluative practices they say are common in CS education. That seems fine, but their explanation for why these things are relevant is weird.
1. Debugging. According to them "this skill is curcial for detecting when AI-generated code is incorrect and understanding why it fails.
Maybe their definition is more expansive than it seems here? But it's been my experience, professionally, that this is just not the case. The only even sort-of reliable mechanism for detecting and understanding the shit behavior of slop code is extensive validation suites.
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#62

2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."
Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.
J J S 3 Replies Last reply

0
J jenniferplusplus@hachyderm.io

2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."
Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.
J This user is from outside of this forum
J This user is from outside of this forum
jonny@neuromatch.social

wrote sidst redigeret af

#63

@jenniferplusplus the latter part is especially true and i don't have any sort of strategy for handling it. i have to read every single line of LLM code because the space of possible mistakes it can make is so large. with humans, even if someone really doesn't know what they are doing, there are only so many kinds of things that could conceivably screw up.
1 Reply Last reply

0
J jenniferplusplus@hachyderm.io

2. Code Reading. "This skill enables humans to understand and verify AI-written code before deployment."
Again, not in my professional experience. It's just too voluminous and bland. And no one has time for that shit, even if they can make themselves do it. Plus, I haven't found anyone who can properly review slop code, because we can't operate without the assumptions of comprehension, intention, and good faith that simply do not hold in that case.
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#64

3. Code writing. Honestly, I don't get the impression they even understand what this means. They say "Low-level code writing, like remembering the syntax of functions, will be less important with further integration of AI coding tools
than high-level system design."
Neither of those things is a meaningful facet of actually writing code. Writing code exists entirely in-between those two things. Code completion tools basically eliminate having to think about syntax (but we will return to this). And system design happens in the realm of abstract behaviors and responsibilities.
J J 2 Replies Last reply

0
J jenniferplusplus@hachyderm.io

3. Code writing. Honestly, I don't get the impression they even understand what this means. They say "Low-level code writing, like remembering the syntax of functions, will be less important with further integration of AI coding tools
than high-level system design."
Neither of those things is a meaningful facet of actually writing code. Writing code exists entirely in-between those two things. Code completion tools basically eliminate having to think about syntax (but we will return to this). And system design happens in the realm of abstract behaviors and responsibilities.
J This user is from outside of this forum
J This user is from outside of this forum
jenniferplusplus@hachyderm.io

wrote sidst redigeret af

#65

4. Conceptual. As they put it, "Conceptual understanding is critical to assess whether AI-generated code uses appropriate design patterns that adheres to how the library should be used.
IIIIIII guess. That's not wrong, exactly? But it's such a reverse centaur world view. I don't want to be the conceptual bounds checker for the code extruder. And I don't understand why they don't understand that.
M J 2 Replies Last reply

0

Login for at svare

1
2
3
4
5