Closed as duplicate of#13664
Description
Name and Version
from b5450 to latest version
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m H:\models\Sowkwndms\Qwen3-30B-A3B-abliterated-Q4_K_M-GGUF\qwen3-30b-a3b-abliterated-q4_k_m.gguf --port 1234 -c 4096 -ngl 46 -t 16 --no-warmup
Problem description & steps to reproduce
Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s. While from b5449 or earlier version,the token generation rate is about 22 tok/s. I'm using Windows Vulkan x64 binary, My notebook PC platform: Lenovo ThinkBook 14 G7+ IAH, Intel Core Ultra 7 255H CPU,Intel ARC 140T iGPU,32GB RAM,Windows 11 24H2.
First Bad Commit
No response