Here is a sad (and somewhat pathetic, I guess) fact: The new Firefox "smart window" (which is an LLM-based browser), doesn't even use a local or open model, it's literally just Google's models run via their API

pojntfx@mastodon.social

@freddy @buherator I hope there is at least an option of using a local LLM, check even GLM-4.6V is good enough for instrumenting browsers in my experience. Signing into an account (thereby sending all of my LLM context directly to my identity with Mozilla) and proxying via Mozilla infrastructure to Google (which does not anonymise since the context contains everything already) seems like a terrible direction here, seriously. Esp. given that there are lots of ways to run LLMs locally.

pojntfx@mastodon.social

@kstrlworks Servo honestly seems like the only way forward.

pojntfx@mastodon.social

@madsenandersc You're not wrong in a lot of ways. But I'll also say that recent advances in quantization (I'm using the GLM-4.6V model) and also the vulkan acceleration support in say llama.cpp is making a big difference. My RX4060 and AMD 890m are more than good enough to instrument a browser with a fully local LLM now.

kstrlworks@techhub.social

@pojntfx Be it Servo or Ladybird, the first one to reach stable status with uBlock Origin will feed the masses.

madsenandersc@social.vivaldi.net

@pojntfx oh, for sure - with that kind of hardware things start to look different.

I'm still not sold on models below 7-12B, and even without quantization, I feel that they tend to hallucinate a bit too much. With quantization, things tend to get even more...creative.

I run gpt-oss-20b locally, and that is fine for some tasks, but the second things involve searching the web, the results become very much a mixed bag. I run it on a homelab with a Radeon 780m, but the great thing is that I can allocate up to 32GB of RAM to the GPU - that makes things not very fast but reasonably accurate.

The second part is language. I have found very few smaller models that speak Danish well enough to be useful (gemma-3-4b is one), so again - sending the query to a remotely server with a 120B or 403B model makes much more sense from a user-centric standpoint.

pojntfx@mastodon.social

@kstrlworks Ladybird's governance issues really make it not a viable alternative in my eyes. Solid engineering, but damn I won't be working with someone who believes I shouldn't be working or even exist

pojntfx@mastodon.social

@madsenandersc Huh, interesting - yeah I never really deal with languages other than French, German and English I guess, haven't really run into this. For web search, https://newelle.qsk.me/#home has been surprisingly good with a 18B model, even though it's slow.

I guess one way they could implement the whole remote server situation would be to lean on say an OpenAI-compatible API - which something like vllm, llama.cpp, SGLang and so on can provide

freddy@social.security.plumbing

@pojntfx I have seen PRDs that discuss local models, but I don’t recall if they are part of the mvp. Might just be that folks can set a custom pref in about:config for now.

tommi@pan.rent

@pojntfx Noooo @mala you cannot allow this

madsenandersc@social.vivaldi.net

@pojntfx

I've never heard of Nevelle before - that seems really promising. It definitely does a good job of searching the web when responding to queries, better than what I get from OpenWebUI and Ollama.

Yes, implementing a simple OpenAI compatible interface, where the user can connect to their local AI installation would be almost a given - it would remove a lot of worries about privacy for those who want to keep their information in-house.

My wife work at a place where there are a lot of industry secrets, so using AI is a no-go for them, even if it is just for aggregating data from the web or summarizing their own information from a lot of different documents. For them, local AI is not an option, it is a requirement.