Here is a sad (and somewhat pathetic, I guess) fact: The new Firefox "smart window" (which is an LLM-based browser), doesn't even use a local or open model, it's literally just Google's models run via their API
-
@pojntfx @buherator not true. The feature is currently in development and uses different models while things are still under test. This is pre-release software. Behavior will change.
Currently, everything is proxied through mozilla infra. The model that will ship is (afaiu) not yet determined.@freddy @pojntfx @buherator Proxying this stuff doesn’t really relieve many privacy issues when someone’s shoving private personal data into it, like what typically happens with AI assisted browsing.
If someone’s just using an AI feature to summarize some random news articles or something proxying the requests is probably fine for most users.
Confidential compute is better, but still not ideal.
-
@freddy @pojntfx @buherator Proxying this stuff doesn’t really relieve many privacy issues when someone’s shoving private personal data into it, like what typically happens with AI assisted browsing.
If someone’s just using an AI feature to summarize some random news articles or something proxying the requests is probably fine for most users.
Confidential compute is better, but still not ideal.
@oliviablob @pojntfx @buherator I agree. If you want privacy, you probably shouldn’t use an llm hosted and controlled by someone else. I certainly wouldn’t.
-
@pojntfx @buherator not true. The feature is currently in development and uses different models while things are still under test. This is pre-release software. Behavior will change.
Currently, everything is proxied through mozilla infra. The model that will ship is (afaiu) not yet determined.@freddy @buherator I hope there is at least an option of using a local LLM, check even GLM-4.6V is good enough for instrumenting browsers in my experience. Signing into an account (thereby sending all of my LLM context directly to my identity with Mozilla) and proxying via Mozilla infrastructure to Google (which does not anonymise since the context contains everything already) seems like a terrible direction here, seriously. Esp. given that there are lots of ways to run LLMs locally.
-
@pojntfx Mozilla will really invest everywhere except making a better browser.
@kstrlworks Servo honestly seems like the only way forward.
-
@pojntfx to be fair, most computers are unable to run a meaningful LLM model at a speed that makes sense.
Yes, you can run a gemma-3-4b model on a CPU, but it is really very limited and tends to have quite a lot of hallucinations. I don't know any open models that would do significantly better, but I would love to be proven wrong.
@madsenandersc You're not wrong in a lot of ways. But I'll also say that recent advances in quantization (I'm using the GLM-4.6V model) and also the vulkan acceleration support in say llama.cpp is making a big difference. My RX4060 and AMD 890m are more than good enough to instrument a browser with a fully local LLM now.
-
@kstrlworks Servo honestly seems like the only way forward.
@pojntfx Be it Servo or Ladybird, the first one to reach stable status with uBlock Origin will feed the masses.
-
@madsenandersc You're not wrong in a lot of ways. But I'll also say that recent advances in quantization (I'm using the GLM-4.6V model) and also the vulkan acceleration support in say llama.cpp is making a big difference. My RX4060 and AMD 890m are more than good enough to instrument a browser with a fully local LLM now.
@pojntfx oh, for sure - with that kind of hardware things start to look different.
I'm still not sold on models below 7-12B, and even without quantization, I feel that they tend to hallucinate a bit too much. With quantization, things tend to get even more...creative.

I run gpt-oss-20b locally, and that is fine for some tasks, but the second things involve searching the web, the results become very much a mixed bag. I run it on a homelab with a Radeon 780m, but the great thing is that I can allocate up to 32GB of RAM to the GPU - that makes things not very fast but reasonably accurate.
The second part is language. I have found very few smaller models that speak Danish well enough to be useful (gemma-3-4b is one), so again - sending the query to a remotely server with a 120B or 403B model makes much more sense from a user-centric standpoint.
-
@pojntfx Be it Servo or Ladybird, the first one to reach stable status with uBlock Origin will feed the masses.
@kstrlworks Ladybird's governance issues really make it not a viable alternative in my eyes. Solid engineering, but damn I won't be working with someone who believes I shouldn't be working or even exist
-
@pojntfx oh, for sure - with that kind of hardware things start to look different.
I'm still not sold on models below 7-12B, and even without quantization, I feel that they tend to hallucinate a bit too much. With quantization, things tend to get even more...creative.

I run gpt-oss-20b locally, and that is fine for some tasks, but the second things involve searching the web, the results become very much a mixed bag. I run it on a homelab with a Radeon 780m, but the great thing is that I can allocate up to 32GB of RAM to the GPU - that makes things not very fast but reasonably accurate.
The second part is language. I have found very few smaller models that speak Danish well enough to be useful (gemma-3-4b is one), so again - sending the query to a remotely server with a 120B or 403B model makes much more sense from a user-centric standpoint.
@madsenandersc Huh, interesting - yeah I never really deal with languages other than French, German and English I guess, haven't really run into this. For web search, https://newelle.qsk.me/#home has been surprisingly good with a 18B model, even though it's slow.
I guess one way they could implement the whole remote server situation would be to lean on say an OpenAI-compatible API - which something like vllm, llama.cpp, SGLang and so on can provide
-
@freddy @buherator I hope there is at least an option of using a local LLM, check even GLM-4.6V is good enough for instrumenting browsers in my experience. Signing into an account (thereby sending all of my LLM context directly to my identity with Mozilla) and proxying via Mozilla infrastructure to Google (which does not anonymise since the context contains everything already) seems like a terrible direction here, seriously. Esp. given that there are lots of ways to run LLMs locally.
@pojntfx I have seen PRDs that discuss local models, but I don’t recall if they are part of the mvp. Might just be that folks can set a custom pref in about:config for now.
-
Here is a sad (and somewhat pathetic, I guess) fact: The new Firefox "smart window" (which is an LLM-based browser), doesn't even use a local or open model, it's literally just Google's models run via their API
-
@madsenandersc Huh, interesting - yeah I never really deal with languages other than French, German and English I guess, haven't really run into this. For web search, https://newelle.qsk.me/#home has been surprisingly good with a 18B model, even though it's slow.
I guess one way they could implement the whole remote server situation would be to lean on say an OpenAI-compatible API - which something like vllm, llama.cpp, SGLang and so on can provide
I've never heard of Nevelle before - that seems really promising. It definitely does a good job of searching the web when responding to queries, better than what I get from OpenWebUI and Ollama.
Yes, implementing a simple OpenAI compatible interface, where the user can connect to their local AI installation would be almost a given - it would remove a lot of worries about privacy for those who want to keep their information in-house.
My wife work at a place where there are a lot of industry secrets, so using AI is a no-go for them, even if it is just for aggregating data from the web or summarizing their own information from a lot of different documents. For them, local AI is not an option, it is a requirement.