In search for a new self-hosted LLM

Tanka@lemmy.ml · 16 hours ago

In search for a new self-hosted LLM

sobchak · edit-2 13 hours ago

I tried some new ones recently (though I have a 24GB GPU). Qwen3.5 9B is pretty impressive for such a small model for agentic stuff like Claude Code. (I can run the Opus distilled model quantized to 6 bit with the full 256k context and no CPU offloading). Gemma4 26B is good if I don’t need agentic stuff or a lot of context (it sucks for agentic stuff). You can probably run the smaller versions of these, or with less context.