all the criticism has been said, all the takes been had.

louka@mementomori.social

@jonny I believe addiction treatment centres are seeing the same as you. Is that peer reviewed hard research or them capitalizing on what people feel like they need treatment for? I don't know.
https://www.naadac.org/treating-internet-addiction-ai-pornography-social-media-online-gambling-gaming

option8@oldbytes.space

@jonny Just one more commit. This time will be different.

rhelune@todon.eu

@prema @jonny "I can quit anytime, I just choose not to."
"I am only a social slopper. I never consume slop when alone at home."
"I only use chatbots with my morning coffee. Coffee doesn't taste good without a chatbot."
"Without a chatbot I wouldn't know what to do with my hands."
"I would have quit, but I do not want to get fat."
"All my colleagues use slop, I also need to use slop for the sake of networking."
"If I quit nobody is going to invite me out anymore because I won't be fun to be around."

glassresistor@sfba.social

@jonny @ainmosni exactly. winning early and being on the dopamine chasing side of nuerospicy helps kick start it.

i tried AI a bit and omg when its works SHOT TO THE VIEN. it takes the things you hate feeling and makes it gone but moving through bad feelings is like exercise you cant keep sick gains without daily effort

datarama@hachyderm.io

@ainmosni @jonny I don't get the rush, but I *do* get sucked into loops I can't easily break out of (I have OCD). I get sucked into "completionist mode" in games, which means that if I play a game where there are collectibles you can buy with actual money, I'm quite likely to make very stupid decisions.

I've learned my lesson, I avoid such games scrupulously, and I steer clear of gambling.

And I am very distraught that my workplace now wants me to use AI.

tempusfelix@wehavecookies.social

@jonny

Agreed. I don’t gamble because I know I’m vulnerable. And the use of Ai really is the just one more sensation. It’s too damned easy to get a plausible output. And it’s hard to stop.

catch56@kolektiva.social

@concretedog @jonny @ainmosni also don't gamble but I think I am closer to the avoidance due to not wanting to fall into a hole than the complete lack of interest, I internally predict election results and other stuff just don't put actual money on it.

Don't smoke because I know it would be hard to stop and rarely play computer games for the same reason. Don't use LLMs and a big part of that is seeing other people get sucked in.

glassresistor@sfba.social

@jonny so this thread could be a case study

https://social.losno.co/@chris/116655930139554496

jonny@neuromatch.social

@glassresistor
Yeah I saw that and got sad

jonny@neuromatch.social

So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

https://github.com/RsyncProject/rsync/pull/903/

This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

arrjay@tacobelllabs.net

@jonny *screaming intensifies*

ricci@discuss.systems

@jonny

jonny@neuromatch.social

I think the modal situation here is that the people are reading none or very little of what is being generated by the LLM, so the tests have a special role: Tests function as the pull arm on the slot machine, you just generate until tests pass, and that's a jackpot. Obviously that's meaningless when the tests are meaningless, so tests take on a very different meaning and role in slot machine coding.

Previously we would write careful test conditions that were based off some real problem or an understanding of what the code under test did, and had a specific thing they were intended to protect against. Tests move slow and are designed to protect us against the things we know can go wrong. When we learn of a new wrong thing, we add a test.

LLM tests have the form of tests but don't do the same thing. They often test nothing, and are just expressions of truisms that the probabilistic text space explored while generating. They have strongly worded names but end up actually asserting that basic language features work as expected. Because it is not us writing tests for ourselves, where we only harm ourselves by making them weak, they function instead as a passively obfuscated justification for the code that the LLM generates. The user wants the tests to pass. The LLM provides.

The tests are theater: they are the play field for the slot machine. They are mild, surmountable, need to fail a few times to be plausible, but must eventually pass within the expected generation loop window to deliver the payout.

aud@fire.asta.lgbt

@jonny@neuromatch.social oh holy FUCK

it is so bad

peterrenshaw@ioc.exchange

@jonny “tests have a special role” ️

jonny@neuromatch.social

Here's an example from some code that was thrust at me this week. The rest of the tests try a bit harder to look like tests, but this one is perplexing.

What does it test? The function name suggests its a smoke test. LLMs love to call things smoke tests. That would suggest this would be an early-run test that fails loudly if some basic precondition - like having ffmpeg - fails. Or, I guess we are smoke testing the ensure_ffmpeg function? Anyway who knows. However we first check if ffmpeg or ffprobe are present, which is exactly what ensure_ffmpeg does. If they aren't present, a warning tells us that ffmpeg/ffprobe are required for the video tests, which makes it seem like this should be a parameterizing test that controls which tests are run, which of course it does not do.

So the test literally does nothing and cannot possibly fail, but says it does at least two things, because to an LLM something saying it does something is the same thing as it actually doing that thing.

knutson_brain@sfba.social

@jonny
The model is metastatic…

bms48@mastodon.social

@jonny I have seen this pattern of which you speak when attempting to use LLMs to compare TCP Delayed-ACK implementations between BSD derived code bases. They generated output suggesting semantics that just weren't there, presumably based on how similarly named things were between each fork, but this was not obvious without reading the source for oneself in context. This went doubly for FreeBSD where there are multiple TCP functional blocks ("stacks").

jonny@neuromatch.social

To a person, the whole purpose of the test is for it to fail when it should. That's an elemental part of writing good tests: they must fail before the patch, or else they provide no protection. We want protection from failure, that is good for us. We need tests to protect us because we can't possibly evaluate all the other parts of a complex system when we try to fix one part of it.

LLM slot machines change what tests mean - of course we still want the code to work good, but if we're not evaluating the code or the tests, then what the slot machine turns them into is just a high score and the jackpot condition. 130 new tests added, that means its good. They pass, that means I win.

The bugfix loop with LLMs defeats the purpose of automated tests and renders it no better than manual testing: you notice a bug, you yell at the LLM to fix it, you keep looking at the specific thing that's broken until its fixed, good robot, ship it. The changes don't have meaningful tests, and nothing else does either, so the slot machine loop repeats, bug->fix->win. Very velocity. Rocket fuel even.

murodegrizeco@toad.social

@jonny

Huh, yeah, gambling works as a metaphor!

I was going more with ...

"This person came back from the alien ship, they had a grabby thing on their face but it fell off, and now the person is weirdly hungry, and maybe we should worry about what's growing inside them."