Farvel Big Tech

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @engideer now you are talking about absolute trust. I do think we are indeed talking about different things. Do you use LLMs? Do you assign the same level of trust to qwen-3.6 than to gpt-2? because I do not, partly based on benchmarks, partly on personal experience, partly on my (admittedly perfunctory) theoretical understanding of its training and inference setup.

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @newhinton I think you are misreading what I am saying. That is exactly what I am saying. I never fully trust my code, not a single line of it, partly because every line of my code usually requires billions of lines of code I haven’t written to run. I can apply methods and use my experience to trust it enough to run it.

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @engideer temperature based sampling is just one of the many sampling modalities. Nucleus sampling, top-k, frequency penalties, all of these introduce controlled randomness to improve the performance of llms as measured by a wide variety of benchmarks.

A random sampling of tokens would actually be uniformly distributed… and obviously grammatically correct sentences is a clear sign that we are not randomly sampling tokens.

Are we talking about the same thing?

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @newhinton but “fairly sure” is not full trust. I can also be “fairly sure” that something works, but I’m not going to trust my judgment and instead will try to validate it and provide proper guardrails so that if it is misbehaving, it is at least contained. Some things will be just fine even if broken, some less and will make me invest me more of my time. I am not going to try to prove the kernel correct just because I am changing a css color. I don’t see how that is different with llms, and I use them everyday. If anything, they allow me to validate more.

mnl@hachyderm.io

@engideer @david_chisnall @mjg59 @ignaloidas also I didn’t say anything of what you quoted, and I don’t know where you got it from.

mnl@hachyderm.io

@engideer @david_chisnall @mjg59 @ignaloidas I don’t think llms are “rando”. They have randomized elements during training and inference, but they’re not a random number generator. I also would trust a “rando” less than an expert in real life. I wouldn’t trust either blindly either.

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @newhinton how did you gain your confidence? How can you call machine learning a bunch of dice? I try to study and build things everyday and yes I don’t trust my code at all, which I think is a healthy attitude to have? I am definitely not able to produce perfect code on the first try.

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @newhinton do you blindly trust code just because it’s been written by a human? Or your own code for that matter? I don’t, and yet I am able to produce hopefully useful software. In fact I have to trust an immense amount of software without verifying it, based on vibes. For llms at least I can benchmark the vibes, or at least more easily gather empirical observations than with humans.

mnl@hachyderm.io

@ced I just read the primary source when I think it’s useful to do so

mnl@hachyderm.io

@ced @david_chisnall @mjg59 @ignaloidas @kagihq to the search engine thing, one reason I think that they’re usually more problematic to use is that there’s actually incentives to make results worse. I switched to Kagi from google/duckduckgo before ChatGPT because the results were already complete trash.

Sure, I have to pay by the search, but that’s the only business model that at least enables non-gameable results.

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @newhinton that’s also not how current llms work, there is a significant amount of post-training using RL being done, and that too is a whole field of research.

Furthermore, current llm-based tools usually do multiple round of inference interspersed with more traditional “tool calls” (or, as I prefer to call it, interpreting sampled tokens in a deterministic/formal manner).

mnl@hachyderm.io

@ced @david_chisnall @mjg59 @ignaloidas which search engine do you use? I use @kagihq and it’s always a pleasure.

Llms can provide information about sources. If they tell me that Shannon said x in his thesis on p.463 I can look it up. If they tell me that variable foo is on line X in file Y, I can easily verify it. If they think that Z compiles, I don’t even need to cross check that, the computer can do it for me. In fact verifying certain assumptions about code might be the easiest of them all, which is why llms are quite effective at writing code.

mnl@hachyderm.io

@ignaloidas @mjg59 @david_chisnall @newhinton that’s not how llms work though, it being right 9 times out of 10 very much has an influence on whether the 10th time will be correct. That’s literally how models are trained. There’s an entire research field out there that studies it.

mnl@hachyderm.io

@ced @david_chisnall @mjg59 @ignaloidas do you not use a search engine (genuinely curious, I love building search engines and making them work well)?

Do you think it’s impossible to assign varying degrees of trust to llm output?

mnl@hachyderm.io

@ced @david_chisnall @mjg59 @ignaloidas neither does an llm? We are perfectly able to deal with, say, search engine results, which are arguably more problematic than llms. For all intents and purposes, the books and resources I have at my disposal are also the product of random processes. I can still work with them to learn things.

mnl@hachyderm.io

@newhinton @david_chisnall @mjg59 @ignaloidas I’m not really following. using an llm doesn’t erase my brain the minute I use it, nor are is it a random number generator where you are forbidden to check the answers? These all hold for llms.

mnl@hachyderm.io

@david_chisnall @mjg59 @ignaloidas I have encountered plenty of people and books that were wrong, so I still have to engage my brain and double check, though.

mnl@hachyderm.io

@david_chisnall @mjg59 @ignaloidas just like humans! Or books!

mnl@hachyderm.io

@david_chisnall @mjg59 @ignaloidas llms can be used to explain and learn things. Unsurprisingly, that’s what many people do when things don’t work, be they written by a human or not, and they want them to work.

mnl@hachyderm.io

Indlæg