Hey :) For a while now I use gpt-oss-20b on my home lab for lightweight coding tasks and some automation. I’m not so up to date with the current self-hosted LLMs and since the model I’m using was released at the beginning of August 2025 (From an LLM development perspective, it feels like an eternity to me) I just wanted to use the collective wisdom of lemmy to maybe replace my model with something better out there.

Edit:

Specs:

GPU: RTX 3060 (12GB vRAM)

RAM: 64 GB

gpt-oss-20b does not fit into the vRAM completely but it partially offloaded and is reasonably fast (enough for me)

  • nutbutter@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    21 minutes ago

    Have you tried the new gemma4 models? The e4b fits in the 12gb memory and is pretty good. Or you can use 31b too, if you’re okay with offloading to CPU.

  • James R Kirk@startrek.website
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 hours ago

    Just curious, what does “some automation” entail? I thought LLMs could only work with text, like summarize documents and that sort of thing.

    • a1studmuffin@aussie.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 hours ago

      These days they can also chain together tools, keep a working memory etc. Look at Claude Code if you’re curious. It’s come very far very quickly in the last 12 months.

  • jaschen306@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 hours ago

    I’m running gemma4 26b MOE for most of my agent calls. I use glm5:cloud for my development agent because 26b struggles when the context windows gets too big.

  • ejs@piefed.social
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    10 hours ago

    I suggest looking at llm arena leaderboards filtered by open weight models. It offers benchmarks at a very complete and statistically detailed level for models, and usually is quite up to date when new models come out. The new Gemma that just came out might be the best for 1x GPU, and if you have a bunch of vram check out the larger Chinese models

  • Jozzo@lemmy.world
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    1
    ·
    12 hours ago

    I find Qwen3.5 is the best at toolcalling and agent use, otherwise Gemma4 is a very solid all-rounder and it should be the first you try. Tbh gpt-oss is still good to this day, are you running into any problems w it?

    • Tanka@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      9 hours ago

      No problems per se. I just thought that I had not checked for an update for a longer time.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    12 hours ago

    I’m not on there, but you might have more luck in !localllama@sh.itjust.works

    You might also want to list the hardware that you plan to use, since that’ll constrain what you can reasonably run.

  • cron@feddit.org
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    12 hours ago

    The latest open weights model from google might be a good fit for you. The 26B model works pretty well on my machine, though the performance isn’t great (6 tokens per second, CPU only).

  • carzian@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    12 hours ago

    I’m in the same boat. You’ll get better responses if you post your machine specs. I