I've been saying "if AI is making you so productive then where is all this great new software" and I guess the answer is the software is out there it's just not great, it's terrible, and nobody is using it
-
If you want to make completely different points to the one answered - sure!
Has any artist ever compensated another from having looked at their paintings while learning to draw?
Has any budding coder ever compensated others when having studied their code to learn how to do things?
I'm all for lambasting shitty tech bro AI companies, but that's not the same as claiming that any and all LLM usage is bad. I suggest looking at Mistral AI as a european company that's building datacenters using fully renewable energy and ethically sourced data.
@troed not the same thing and you know it. People looking at things and storing copies of someone else's potentially copyrighted data for training are two completely different things. Is it so hard to admit that there are externalities and they are bad no matter how you slice it?
Mistral AI is your run-of-the-mill AI company that does not disclose what's in their training sets, just like everybody else: https://help.mistral.ai/en/articles/347390-does-mistral-disclose-its-training-datasets
-
@troed not the same thing and you know it. People looking at things and storing copies of someone else's potentially copyrighted data for training are two completely different things. Is it so hard to admit that there are externalities and they are bad no matter how you slice it?
Mistral AI is your run-of-the-mill AI company that does not disclose what's in their training sets, just like everybody else: https://help.mistral.ai/en/articles/347390-does-mistral-disclose-its-training-datasets
@gabrielesvelto LLMs don't "store copies" when they train. What happens in their neural networks is very similar to what happens in a human brain when learning.
"run of the mill": https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai/
-
@ahto So you're _guessing_ that if I hadn't created those programs someone else would've and that means that the argument that LLMs indeed produce apps that people use is wrong?
I'm sure you believe that you're great at debating.
> So you're _guessing_ that if I hadn't created those programs someone else would've and that means that the argument that LLMs indeed produce apps that people use is wrong?
I'm suggesting that it would be incorrect to claim that they (or something like it) wouldn't exist without you and an LLM.
> I'm sure you believe that you're great at debating.
Look at you go! Feeling so sure about my beliefs!
-
@gabrielesvelto LLMs don't "store copies" when they train. What happens in their neural networks is very similar to what happens in a human brain when learning.
"run of the mill": https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai/
"Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models"
> "We show that finetuning bypasses these protections: by training models to expand plot summaries into full text, a task naturally suited for commercial writing assistants, we cause GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 to reproduce up to 85-90% of held-out copyrighted books, with single verbatim spans exceeding 460 words, using only semantic descriptions as prompts and no actual book text"
-
"Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models"
> "We show that finetuning bypasses these protections: by training models to expand plot summaries into full text, a task naturally suited for commercial writing assistants, we cause GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 to reproduce up to 85-90% of held-out copyrighted books, with single verbatim spans exceeding 460 words, using only semantic descriptions as prompts and no actual book text"
Are you asking for a lesson in how LLMs work or did you just want to show off ignorance?
-
Are you asking for a lesson in how LLMs work or did you just want to show off ignorance?
Oh, those are the only choices? I was just adding a link that was relevant and that you might want to read

-
Oh, those are the only choices? I was just adding a link that was relevant and that you might want to read

Let's do it like this: Are there limits to how much you can compress data?
(This is a Computer Science 101 question so don't spend too long on it)
-
Let's do it like this: Are there limits to how much you can compress data?
(This is a Computer Science 101 question so don't spend too long on it)
I'm not here to answer a measure theory question.
I'm just pointing out an article that goes againsts yours.
-
I'm not here to answer a measure theory question.
I'm just pointing out an article that goes againsts yours.
Oh this isn't theory. Let's try again: Is there a limit to how much you can compress data?
I get why you don't _want_ to answer, since the answer proves that with the laws of physics in this universe LLMs don't store copies of their training data.
-
Oh this isn't theory. Let's try again: Is there a limit to how much you can compress data?
I get why you don't _want_ to answer, since the answer proves that with the laws of physics in this universe LLMs don't store copies of their training data.
> Oh this isn't theory. Let's try again: Is there a limit to how much you can compress data?
Well, I'm going to say that it depends on the domain here but you are probably after something specific here.
> I get why you don't _want_ to answer, since the answer proves that with the laws of physics in this universe LLMs don't store copies of their training data.
Oh, please do tell!
-
> Oh this isn't theory. Let's try again: Is there a limit to how much you can compress data?
Well, I'm going to say that it depends on the domain here but you are probably after something specific here.
> I get why you don't _want_ to answer, since the answer proves that with the laws of physics in this universe LLMs don't store copies of their training data.
Oh, please do tell!
No, it doesn't "depends".
https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
If LLMs stored their training data we would apparently be able to compress all of human knowledge into files easily downloadable onto regular computers, since that's the size of LLM models.
They don't. They do however learn and have better memories than human brains so they can indeed regurgitate 460 words in a row (that's from the paper you linked) from a source in some cases.
If you want to play debate, try learning the subject matter first.
-
No, it doesn't "depends".
https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
If LLMs stored their training data we would apparently be able to compress all of human knowledge into files easily downloadable onto regular computers, since that's the size of LLM models.
They don't. They do however learn and have better memories than human brains so they can indeed regurgitate 460 words in a row (that's from the paper you linked) from a source in some cases.
If you want to play debate, try learning the subject matter first.
> No, it doesn't "depends".
Oh okay! I guess, it was silly of myself to assume some constraints. However, I guess you win this one!
> If LLMs stored their training data we would apparently be able to compress all of human knowledge into files easily downloadable onto regular computers, since that's the size of LLM models.
Oh okay! So... LLMs don't store them at all?
> They don't. They do however learn and have better memories than human brains so they can indeed regurgitate 460 words in a row (that's from the paper you linked) from a source in some cases.
But now you are saying they do? If they could regurgitate 460 words in a row, sounds like they stored it or have some kind of memory of it right?
Like, that article was stating "reproduce up to 85-90% of held-out copyrighted books"
So... there is some representation in which one could consider it looking like compression in which HEY, look at that:
"A review of state-of-the-art techniques for large language model compression"https://link.springer.com/article/10.1007/s40747-025-02019-z
-
No, it doesn't "depends".
https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
If LLMs stored their training data we would apparently be able to compress all of human knowledge into files easily downloadable onto regular computers, since that's the size of LLM models.
They don't. They do however learn and have better memories than human brains so they can indeed regurgitate 460 words in a row (that's from the paper you linked) from a source in some cases.
If you want to play debate, try learning the subject matter first.
@troed good, what do you know about modern neuroscience? Because you know what they say: extraordinary claims require extraordinary proof. And you claimed that LLMs memorize things like the human brain, can you prove it? Because @ahto provided one of several.peer reviewed articles that prove without question that LLMs store high-probability training data essentially verbatim. But you didn't provide proof that the human brain store sparse matrixes and multiplies them.
-
@dome @0x0961h @jargoggles @eniko Did you just post text an LLM gave you? Pull your head out of your ass and type your own response.
-
@eniko AI enjoyers always react very emotional to criticism

-
-
@troed not the same thing and you know it. People looking at things and storing copies of someone else's potentially copyrighted data for training are two completely different things. Is it so hard to admit that there are externalities and they are bad no matter how you slice it?
Mistral AI is your run-of-the-mill AI company that does not disclose what's in their training sets, just like everybody else: https://help.mistral.ai/en/articles/347390-does-mistral-disclose-its-training-datasets
AI is being treated like that movie "Blood Diamond". I believe many are unaware, due to the loss of transparency, the costs involved to pull off that simple free prompt generation, which creators were stolen from, which towns had their houses demolished, or what resources were extracted for AI infrastructure.
Quili.ai made an ethical point sitting in as a team of human AI prompt responders to save their town of water scarcity; revealing the community we have forgotten.
-
> No, it doesn't "depends".
Oh okay! I guess, it was silly of myself to assume some constraints. However, I guess you win this one!
> If LLMs stored their training data we would apparently be able to compress all of human knowledge into files easily downloadable onto regular computers, since that's the size of LLM models.
Oh okay! So... LLMs don't store them at all?
> They don't. They do however learn and have better memories than human brains so they can indeed regurgitate 460 words in a row (that's from the paper you linked) from a source in some cases.
But now you are saying they do? If they could regurgitate 460 words in a row, sounds like they stored it or have some kind of memory of it right?
Like, that article was stating "reproduce up to 85-90% of held-out copyrighted books"
So... there is some representation in which one could consider it looking like compression in which HEY, look at that:
"A review of state-of-the-art techniques for large language model compression"https://link.springer.com/article/10.1007/s40747-025-02019-z
You're trying to argue against what amounts to natural laws, from your own lack of knowledge about the area.
That's the same thing as climate deniers do.
-
@troed good, what do you know about modern neuroscience? Because you know what they say: extraordinary claims require extraordinary proof. And you claimed that LLMs memorize things like the human brain, can you prove it? Because @ahto provided one of several.peer reviewed articles that prove without question that LLMs store high-probability training data essentially verbatim. But you didn't provide proof that the human brain store sparse matrixes and multiplies them.
I recommend Susan Blackmore's "Consciousness: An Introduction" and Douglas Hofstadters "I Am a Strange Loop" if you want more insight into moden neuroscience.
-
You're trying to argue against what amounts to natural laws, from your own lack of knowledge about the area.
That's the same thing as climate deniers do.
@troed @ahto I guess you mean scientific laws because natural laws are a philosophical concept. I also suppose you meant climate *change* deniers which - besides being uncalled for name-calling - is a bit ironic given that worsening climate change is indeed one of the externalities. Anyway back to the case in point it's not about a handful of words, we now have plenty of literature proving that training data contains verbatim copies of high-probability inputs regardless of what Shannon's theorem says, e.g.:
https://arxiv.org/pdf/2601.02671v1
> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984
Anyway this brings me to another question. Why defending these systems in the face of the damage they do? Because you used them to write some software? That's your *expertise* that you used. That enabled you to do it, not the tool. That's the lie at the heart of these systems.
← where I wish sloperators were put