Skip to content

Misc. bug: Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s. #13738

Closed as duplicate of#13664
@xmgsincere

Description

@xmgsincere

Name and Version

from b5450 to latest version

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m H:\models\Sowkwndms\Qwen3-30B-A3B-abliterated-Q4_K_M-GGUF\qwen3-30b-a3b-abliterated-q4_k_m.gguf  --port 1234 -c 4096 -ngl 46 -t 16 --no-warmup

Problem description & steps to reproduce

Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s. While from b5449 or earlier version,the token generation rate is about 22 tok/s. I'm using Windows Vulkan x64 binary, My notebook PC platform: Lenovo ThinkBook 14 G7+ IAH, Intel Core Ultra 7 255H CPU,Intel ARC 140T iGPU,32GB RAM,Windows 11 24H2.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions