Misc. bug: Retrieval sample not decoding token successfully

Name and Version

version: 5184 (87616f0)
built with MSVC 19.41.34120.0 for x64

Operating systems

Mac, Windows

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

llama-retrieval.exe --context-file <any_text_file> --chunk-size 1 -c 512 -t 8 -m bge-large-en-v1.5-f32.gguf

Problem description & steps to reproduce

The sample failed to decode any tokens created from the text embeddings.

It looks like we need to skip the kv-cache logic to look for an unused slot when pooling is active (which is true for the above model).

The following IF in llama-context.cpp is removed, causing us to go into this logic to search for an unused slot and hit the decoding spew.

    // non-causal masks do not use the KV cache
    if (hparams.causal_attn) {
        kv_self_update();

Just adding "if (!embd_pooling)" appears to fix the issue but I am not sure what it does to the original logic for the non-causal mask with gemma-3.

First Bad Commit

bed4c73

Relevant log output

llama-retrieval.exe --context-file <any_text_file> --chunk-size 1 -c 512 -t 8 -m bge-large-en-v1.5-f32.gguf
...
init:        CPU KV buffer size =    48.00 MiB
llama_context: KV self size  =   48.00 MiB, K (f16):   24.00 MiB, V (f16):   24.00 MiB
llama_context:        CPU compute buffer size =    27.01 MiB
llama_context: graph nodes  = 825
llama_context: graph splits = 1
common_init_from_params: setting dry_penalty_last_n to ctx_size = 512
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 8 (n_threads_batch = 8) / 32 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
batch_decode: n_tokens = 2043, n_seq = 118
find_slot: n_tokens = 2043 > size = 512
decode: failed to find KV cache slot for ubatch of size 2043
llama_decode: failed to decode, ret = 1
get_embeddings_ith: invalid embeddings id 0, reason: no embeddings
batch_decode: failed to get embeddings for token 0
get_embeddings_ith: invalid embeddings id 1, reason: no embeddings
batch_decode: failed to get embeddings for token 1
get_embeddings_ith: invalid embeddings id 2, reason: no embeddings
batch_decode: failed to get embeddings for token 2
get_embeddings_ith: invalid embeddings id 3, reason: no embeddings
batch_decode: failed to get embeddings for token 3
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Retrieval sample not decoding token successfully #13102

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Retrieval sample not decoding token successfully #13102

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions