Skip to content

Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon #12367

Closed
@simonchen

Description

@simonchen

Name and Version

llama-server.exe --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none
version: 4880 (2048b59)
built with MSVC 19.43.34808.0 for x64

Operating systems

Windows

GGML backends

Vulkan

Hardware

Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none

Models

gemma3:12b

Problem description & steps to reproduce

it's unable to load Gemma3:12b GUFF model.

First Bad Commit

No response

Relevant log output

llama-server.exe -m %file_path_gemma3_12b% --no-mmap -c 16384 -np 1 -ngl 50 --temp 0.1 -t 9 -tb 8 -C FF000 --no-perf --host 0.0.0.0 --port 3000 

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none
Not enough set bits in CPU mask (8) to satisfy requested thread count: 9
Not enough set bits in CPU mask (8) to satisfy requested thread count: 9
build: 4880 (2048b591) with MSVC 19.43.34808.0 for x64
system info: n_threads = 9, n_threads_batch = 8, total_threads = 20

system_info: n_threads = 9 (n_threads_batch = 8) / 20 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

main: HTTP server is listening, hostname: 0.0.0.0, port: 3000, http threads: 19
main: loading model
srv    load_model: loading model 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Iris(R) Xe Graphics) - 16224 MiB free
llama_model_loader: loaded meta data with 35 key-value pairs and 1065 tensors from D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                gemma3.attention.head_count u32              = 16
llama_model_loader: - kv   1:             gemma3.attention.head_count_kv u32              = 8
llama_model_loader: - kv   2:                gemma3.attention.key_length u32              = 256
llama_model_loader: - kv   3:            gemma3.attention.sliding_window u32              = 1024
llama_model_loader: - kv   4:              gemma3.attention.value_length u32              = 256
llama_model_loader: - kv   5:                         gemma3.block_count u32              = 48
llama_model_loader: - kv   6:                      gemma3.context_length u32              = 8192
llama_model_loader: - kv   7:                    gemma3.embedding_length u32              = 3840
llama_model_loader: - kv   8:                 gemma3.feed_forward_length u32              = 15360
llama_model_loader: - kv   9:         gemma3.vision.attention.head_count u32              = 16
llama_model_loader: - kv  10: gemma3.vision.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  11:                  gemma3.vision.block_count u32              = 27
llama_model_loader: - kv  12:             gemma3.vision.embedding_length u32              = 1152
llama_model_loader: - kv  13:          gemma3.vision.feed_forward_length u32              = 4304
llama_model_loader: - kv  14:                   gemma3.vision.image_size u32              = 896
llama_model_loader: - kv  15:                 gemma3.vision.num_channels u32              = 3
llama_model_loader: - kv  16:                   gemma3.vision.patch_size u32              = 14
llama_model_loader: - kv  17:                       general.architecture str              = gemma3
llama_model_loader: - kv  18:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:           tokenizer.ggml.add_padding_token bool             = false
llama_model_loader: - kv  22:           tokenizer.ggml.add_unknown_token bool             = false
llama_model_loader: - kv  23:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  24:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  26:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  27:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  29:                      tokenizer.ggml.scores arr[f32,262145]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,262145]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.tokens arr[str,262145]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  32:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - kv  34:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  563 tensors
llama_model_loader: - type  f16:  165 tensors
llama_model_loader: - type q4_K:  290 tensors
llama_model_loader: - type q6_K:   47 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 7.57 GiB (5.34 BPW)
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
srv    load_model: failed to load model, 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
srv   operator (): operator (): cleaning up before exit...
main: exiting due to model loading error

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions