Technically, LLMs of course just generate text one token at a time. But it's an over-simplification that somebody just loaded it up with reddit, project Gutenberg, and PirateBay and use that to generate text similar to what it has seen. As you say, an advanced Markov generator.LLMs do get trained on all that more or less ethically obtained text, but that's just step 1. Next is fine-tuning and last reinforcement learning. Reinforcement is what makes the difference: the model is trained not just to produce text it has seen, but text that makes the trainer happy.Part of what makes the trainer happy is that the answer is correct (or reinforces their existing bias). Sometimes that's more or less verbatim quoting a reddit shitpost, sometimes that's picking out points from an academic paper, and sometimes that's just random gibberish formatted so it looks right.But at the core, an LLM has 3 things: a model of what language looks like, a non-trivial percentage of human knowledge in the form of the internet, and an incentive to provide answers that made the trainer happy.We do not know what that model looks like, except in very simple cases. It works surprisingly well for what it is but is not intelligent.People didn't hate Google Translate or image search before it was rebranded as AI. It uses simpler versions of LLMs. Artists often make use of Photoshop features like "select subject" or "context-aware fill," which are run by neural networks. There is much more to AI than just generative AI and parts of it are useful.The problem is not LLMs (a priori, ignoring the ethics of how they are trained), it's the AI companies having to hype them up. IMO, the correct response is not claiming they are not useful for anything – they obviously are to a lot of people – it's to challenge what they are used for.