Description
Name and Version
./build/bin/llama-cli --version version: 5586 (3ac6753) built with clang version 20.1.4 for aarch64-unknown-linux-android24
tested on Chrome browser for Android
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./build/bin/llama-server --model ~/_memorycard/_y/Qwen3-1.7B-Q8_K_XL.gguf --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.01 --seed 666 -fa --host 0.0.0.0 --no-warmup --jinja
Problem description & steps to reproduce
as by title, when using the --jinja flag with models like Qwen3 or Deepseek-R1-0528-Qwen3 during the thinking proces the animated 3 dots keep spinning and after a while the token counter gets updated but no "thinking" or "thought process" icon/tags appear and no elucubration is written as it never happened; when the reasoning ends the actual answer start showing as usual. When not using the --jinja flag the reasoning stuff happens regularly. When instead using the llama-cli command the reasoning appear inside the <think>...</think>
tags as expected
First Bad Commit
No response