LLM advocates still don’t seem to be able to comprehend that ordering the machine not to ‘make stuff up’ doesn’t help.
-
@benjamineskola genuine question, could the LLM share it's confidence for what it says is true.. It's a probability machine so could it say "I'm 90% sure I'm correct"?
@RoBo2 No. The probability of it generating a particular output is based on frequency not correctness.
-
LLM advocates still don’t seem to be able to comprehend that ordering the machine not to ‘make stuff up’ doesn’t help. It doesn’t know when it’s making stuff up, and it couldn’t change that even if you told it to. (In fact it’s always just making stuff up, and is only ever true by chance.)
Part of why I’m so negative about them is that their advocates simply do not understand how they work and do not seem to want to.
@benjamineskola I wouldn't be surprised if it works. The LLMs have been trained to have a certain level of confidence when replying and to make quick guesses for "trivial" questions. Nudging them to be more thorough could cause them to check their work with deterministic tool calls more frequently. It's sort of part of the problem though that the actual correct way to use the technology is to repeat some superstitious incantation or to talk to it like a cave man.
-
@benjamineskola I wouldn't be surprised if it works. The LLMs have been trained to have a certain level of confidence when replying and to make quick guesses for "trivial" questions. Nudging them to be more thorough could cause them to check their work with deterministic tool calls more frequently. It's sort of part of the problem though that the actual correct way to use the technology is to repeat some superstitious incantation or to talk to it like a cave man.
@pontus_k But the tool has no conception of what is true or false. It can’t ‘check its work’ because it has no way of telling what is better and what is worse. What it would produce is something that has the appearance of a verification process; but it’s no more likely to be true.
-
@prietschka I do recall that a few weeks back he was complaining that LLM advocates get made to feel unwelcome on the fediverse. (OK? I don’t care. It’s nobody’s job to make people feel good about their bad opinions.)
And then just a couple of days ago he was posting something critical, and like … yes this is what we’ve been saying all along.
@benjamineskola @prietschka yup, which prompted one of the folks who works on mastodon to go on a weird "we want journalists to come to fedi, right? how can we entice them to come? they're not coming because fedi is a monoculture, and you all don't engage with their views..." good times
-
@benjamineskola @prietschka yup, which prompted one of the folks who works on mastodon to go on a weird "we want journalists to come to fedi, right? how can we entice them to come? they're not coming because fedi is a monoculture, and you all don't engage with their views..." good times
@patrick_h_lauke that’s the one. i’m happy for antisocial views to remain unwelcome tbh.
-
@patrick_h_lauke that’s the one. i’m happy for antisocial views to remain unwelcome tbh.
@benjamineskola happy for them to post, but then don't complain when nobody likes/subscribes/hits the bell button/follows them/whatever other made-up number-go-up metric they see as engagement
-
@benjamineskola happy for them to post, but then don't complain when nobody likes/subscribes/hits the bell button/follows them/whatever other made-up number-go-up metric they see as engagement
-
@benjamineskola happy for them to post, but then don't complain when nobody likes/subscribes/hits the bell button/follows them/whatever other made-up number-go-up metric they see as engagement
@patrick_h_lauke yes, true, that’s the problem; they want not only to be allowed to share their bad opinions but to be rewarded for doing so (with internet points).
-
@patrick_h_lauke yes, true, that’s the problem; they want not only to be allowed to share their bad opinions but to be rewarded for doing so (with internet points).
@benjamineskola it's the "i used to have 2 million followers on twitter...don't you realise who i am?" mentality
-
@benjamineskola but it's ok, i have a second instance of a different LLM tasked with checking the output of the first LLM is CORRECT...
@patrick_h_lauke @benjamineskola
No no, as one commenter in the original thread mentioned. have THREE different passes... /s -
@RoBo2 No. The probability of it generating a particular output is based on frequency not correctness.
@benjamineskola for example if I typed "the cat sat on the " would it work out the probably the next word is "mat" with a score of 87%
-
LLM advocates still don’t seem to be able to comprehend that ordering the machine not to ‘make stuff up’ doesn’t help. It doesn’t know when it’s making stuff up, and it couldn’t change that even if you told it to. (In fact it’s always just making stuff up, and is only ever true by chance.)
Part of why I’m so negative about them is that their advocates simply do not understand how they work and do not seem to want to.
@benjamineskola
It’s not “hallucinations”.
It’s just putting a stream of words together on the specified subject in a SYNTACTICALLY correct order.
Nothing more, nothing less.
Semantics, accuracy, TRUTH don’t even enter into it.
FFS, it’s not “intelligent”. It’s code.
“Computers don’t make mistakes.” (remember that one?)
People make mistakes. People program computers. (At least, they used to.)
Cthulhu save us all. -
@benjamineskola
It’s not “hallucinations”.
It’s just putting a stream of words together on the specified subject in a SYNTACTICALLY correct order.
Nothing more, nothing less.
Semantics, accuracy, TRUTH don’t even enter into it.
FFS, it’s not “intelligent”. It’s code.
“Computers don’t make mistakes.” (remember that one?)
People make mistakes. People program computers. (At least, they used to.)
Cthulhu save us all.@mysturji Yes, that is my point.
-
@benjamineskola for example if I typed "the cat sat on the " would it work out the probably the next word is "mat" with a score of 87%
@RoBo2 Yes: probability. The sentence is a common one, so it’s likely to be reproduced in the output. But the LLM has no conception of whether a cat really did sit on the mat.
You probably could build an LLM so that it showed the probabilities of each token; but it wouldn’t solve the problem being discussed here at all.
-
@pontus_k But the tool has no conception of what is true or false. It can’t ‘check its work’ because it has no way of telling what is better and what is worse. What it would produce is something that has the appearance of a verification process; but it’s no more likely to be true.
@benjamineskola Some of these systems have access to deterministic tools that could give them a better output. For example, all LLMs struggle with counting letters, but in a lot of cases they have the capability to call the unix utility 'wc' to count letters. Putting 'MAKE NO MISTAKES' in the prompt could possibly make it a bit more likely that it does so and gets it right. Don't get me wrong, I think it's absolutely stupid that this is where we are.
-
LLM advocates still don’t seem to be able to comprehend that ordering the machine not to ‘make stuff up’ doesn’t help. It doesn’t know when it’s making stuff up, and it couldn’t change that even if you told it to. (In fact it’s always just making stuff up, and is only ever true by chance.)
Part of why I’m so negative about them is that their advocates simply do not understand how they work and do not seem to want to.
@benjamineskola Colleague who is adding "AGENTS.md" files to our repositories is adding very similar paragraphs to those files.
Ugh.
-
@benjamineskola Some of these systems have access to deterministic tools that could give them a better output. For example, all LLMs struggle with counting letters, but in a lot of cases they have the capability to call the unix utility 'wc' to count letters. Putting 'MAKE NO MISTAKES' in the prompt could possibly make it a bit more likely that it does so and gets it right. Don't get me wrong, I think it's absolutely stupid that this is where we are.
@pontus_k you don’t need to hunt for ways to make this make sense.
-
@benjamineskola Colleague who is adding "AGENTS.md" files to our repositories is adding very similar paragraphs to those files.
Ugh.
@juliancalaby That sort of thing bugs me so much. Like, if you insist on using these tools (and I know I'm not going to win the fight against them more generally), then at least use them properly.
I've tried to have conversations about 'how do we know whether this actually makes a difference' and so on, and I think it's probably better than it could be, but it's still very silly.
-
@juliancalaby That sort of thing bugs me so much. Like, if you insist on using these tools (and I know I'm not going to win the fight against them more generally), then at least use them properly.
I've tried to have conversations about 'how do we know whether this actually makes a difference' and so on, and I think it's probably better than it could be, but it's still very silly.
@benjamineskola I wrote our company's AI policy, added terms to require short- and long-term evaluation of whether this is actually working for us, and management as a whole agreed so it's company policy. Which is a nice. However the head of the company has gone very AI and is pulling the company in that direction despite ... well ... all the clear points against it and the person who is functionally our sysadmin is heading up a project to add it into our workflows and is using it to do stuff with our infrastructure.
I'm now trying to keep them accountable and biding my time before this blows up in their faces.
Thankfully the "accountability" story is working out fairly well so far, but it's fucking exhausting dealing with this bullshit.
-
@benjamineskola The problem with Obasanjo is he's utterly unprincipled and just chasing engagement/self-aggrandizement. His purpose for being in social spaces like Masto/Bluesky/X is to stroke his ego, so everything he does is just an act of public masturbation.
He's interested in self-aggrandizement and self-promotion, nothing more.
Which is why I use the descriptor "piece of shit" with regard to him.
@prietschka @benjamineskola It’s refreshing to see people stating this plainly. These people are dumb, they make bad choices, and making that observation is not mean.