Skip to content

Eval bug: Ollama Runner uses only 1 CPU for its threads, in guaranteed mode in pod when 8 CPUs are allocated to it #14089

Open
@rsys-rkathpal

Description

@rsys-rkathpal

Name and Version

ollama version is 0.9.0

Operating systems

Linux

GGML backends

CPU

Hardware

Model name: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2

Ollama running on pod, with 8 CPUs allocated to pod

Models

llama-3-1-nemotron-nano-4b-v1-1-gguf with num_thread set as 8 (via modelfile)

Problem description & steps to reproduce

Running llama-3-1-nemotron-nano-4b-v1-1-gguf on pod with 8 vCPUs allocated in guranteed mode
num_thread set as 8 using modelfile
Running the ollama runner for any query, allocates only 1 CPU to it and all threads then run on this single CPU, while other CPUs do not run any thread.
Hence response becomes very slow

First Bad Commit

No response

Relevant log output

ime=2025-06-09T14:08:41.550Z level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible=[]
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-6c2e3cd14694a79028e65683420b03ccbd3d7eea1e77448066fe1a8bdc2265aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 Nemotron Nano 4B v1.1
llama_model_loader: - kv   3:                            general.version str              = v1.1
llama_model_loader: - kv   4:                       general.organization str              = Nvidia
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1-Nemotron-Nano
llama_model_loader: - kv   6:                         general.size_label str              = 4B
llama_model_loader: - kv   7:                            general.license str              = other
llama_model_loader: - kv   8:                       general.license.name str              = nvidia-open-model-license
llama_model_loader: - kv   9:                       general.license.link str              = https://www.nvidia.com/en-us/agreemen...
llama_model_loader: - kv  10:                   general.base_model.count u32              = 1
llama_model_loader: - kv  11:                  general.base_model.0.name str              = Llama 3.1 Minitron 4B Width Base
llama_model_loader: - kv  12:          general.base_model.0.organization str              = Nvidia
llama_model_loader: - kv  13:              general.base_model.0.repo_url str              = https://huggingface.co/nvidia/Llama-3...
llama_model_loader: - kv  14:                      general.dataset.count u32              = 1
llama_model_loader: - kv  15:                     general.dataset.0.name str              = Llama Nemotron Post Training Dataset
llama_model_loader: - kv  16:             general.dataset.0.organization str              = Nvidia
llama_model_loader: - kv  17:                 general.dataset.0.repo_url str              = https://huggingface.co/nvidia/Llama-N...
llama_model_loader: - kv  18:                               general.tags arr[str,4]       = ["nvidia", "llama-3", "pytorch", "tex...
llama_model_loader: - kv  19:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  20:                          llama.block_count u32              = 32
llama_model_loader: - kv  21:                       llama.context_length u32              = 131072
llama_model_loader: - kv  22:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv  23:                  llama.feed_forward_length u32              = 9216
llama_model_loader: - kv  24:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  25:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  26:                       llama.rope.freq_base f32              = 3565775104.000000
llama_model_loader: - kv  27:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  28:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  29:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  30:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  31:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  32:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  33:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  34:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  35:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  36:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  37:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  38:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {%- if tools %}{{- '<|begin_of_text|>...
llama_model_loader: - kv  40:               general.quantization_version u32              = 2
llama_model_loader: - kv  41:                          general.file_type u32              = 15
llama_model_loader: - type  f32:   66 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 2.58 GiB (4.91 BPW) 
init_tokenizer: initializing tokenizer for type 2
load: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG
load: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG
load: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG
load: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG
load: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG
load: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG
load: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG
load: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG
load: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG
load: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG
load: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG
load: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG
load: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG
load: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG
load: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG
load: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG
load: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG
load: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG
load: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG
load: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG
load: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG
load: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG
load: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG
load: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG
load: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG
load: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG
load: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG
load: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG
load: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG
load: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG
load: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG
load: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG
load: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG
load: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG
load: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG
load: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG
load: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG
load: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG
load: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG
load: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG
load: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG
load: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG
load: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG
load: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG
load: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG
load: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG
load: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG
load: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG
load: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG
load: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG
load: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG
load: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG
load: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG
load: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG
load: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG
load: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG
load: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG
load: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG
load: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG
load: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG
load: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG
load: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG
load: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG
load: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG
load: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG
load: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG
load: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG
load: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG
load: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG
load: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG
load: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG
load: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG
load: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG
load: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG
load: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG
load: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG
load: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG
load: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG
load: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG
load: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG
load: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG
load: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG
load: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG
load: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG
load: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG
load: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG
load: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG
load: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG
load: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG
load: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG
load: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG
load: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG
load: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG
load: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG
load: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG
load: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG
load: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG
load: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG
load: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG
load: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG
load: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG
load: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG
load: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG
load: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG
load: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG
load: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG
load: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG
load: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG
load: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG
load: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG
load: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG
load: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
load: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
load: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG
load: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG
load: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG
load: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG
load: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG
load: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG
load: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG
load: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG
load: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG
load: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG
load: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG
load: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG
load: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG
load: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG
load: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG
load: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG
load: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG
load: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG
load: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG
load: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG
load: control token: 128007 '<|end_header_id|>' is not marked as EOG
load: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG
load: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG
load: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG
load: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG
load: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG
load: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG
load: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG
load: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG
load: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG
load: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG
load: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG
load: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG
load: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG
load: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG
load: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG
load: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG
load: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG
load: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG
load: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG
load: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG
load: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG
load: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
load: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG
load: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG
load: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG
load: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG
load: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG
load: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG
load: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG
load: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG
load: control token: 128001 '<|end_of_text|>' is not marked as EOG
load: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG
load: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG
load: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG
load: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG
load: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG
load: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG
load: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG
load: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG
load: control token: 128006 '<|start_header_id|>' is not marked as EOG
load: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG
load: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG
load: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG
load: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG
load: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG
load: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG
load: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG
load: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG
load: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG
load: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG
load: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG
load: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG
load: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG
load: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG
load: control token: 128000 '<|begin_of_text|>' is not marked as EOG
load: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG
load: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG
load: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG
load: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG
load: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG
load: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG
load: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG
load: control token: 128010 '<|python_tag|>' is not marked as EOG
load: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG
load: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG
load: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG
load: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG
load: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG
load: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG
load: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG
load: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG
load: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG
load: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG
load: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG
load: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG
load: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG
load: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG
load: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG
load: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG
load: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG
load: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG
load: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG
load: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG
load: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG
load: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG
load: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG
load: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG
load: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG
load: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG
load: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG
load: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG
load: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG
load: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG
load: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG
load: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG
load: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG
load: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG
load: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG
load: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG
load: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG
load: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG
load: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG
load: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG
load: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG
load: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG
load: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG
load: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG
load: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG
load: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG
load: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG
load: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG
load: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG
load: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG
load: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG
load: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG
load: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG
load: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG
load: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG
load: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG
load: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 4.51 B
print_info: general.name     = Llama 3.1 Nemotron Nano 4B v1.1
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128009 '<|eot_id|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-06-09T14:08:42.028Z level=DEBUG source=gpu.go:695 msg="no filter required for library cpu"
time=2025-06-09T14:08:42.028Z level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /root/.ollama/models/blobs/sha256-6c2e3cd14694a79028e65683420b03ccbd3d7eea1e77448066fe1a8bdc2265aa --ctx-size 8192 --batch-size 512 --threads 8 --no-mmap --parallel 2 --port 44809"
time=2025-06-09T14:08:42.028Z level=DEBUG source=server.go:432 msg=subprocess OLLAMA_NUM_THREADS=8 OLLAMA_DEBUG=1 PATH=/workspace/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama
time=2025-06-09T14:08:42.029Z level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-06-09T14:08:42.029Z level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
time=2025-06-09T14:08:42.029Z level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
time=2025-06-09T14:08:42.056Z level=INFO source=runner.go:815 msg="starting go runner"
time=2025-06-09T14:08:42.056Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-skylakex.so
time=2025-06-09T14:08:42.090Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-06-09T14:08:42.106Z level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:44809"
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-6c2e3cd14694a79028e65683420b03ccbd3d7eea1e77448066fe1a8bdc2265aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.1 Nemotron Nano 4B v1.1
llama_model_loader: - kv   3:                            general.version str              = v1.1
llama_model_loader: - kv   4:                       general.organization str              = Nvidia
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.1-Nemotron-Nano
llama_model_loader: - kv   6:                         general.size_label str              = 4B
llama_model_loader: - kv   7:                            general.license str              = other
llama_model_loader: - kv   8:                       general.license.name str              = nvidia-open-model-license
llama_model_loader: - kv   9:                       general.license.link str              = https://www.nvidia.com/en-us/agreemen...
llama_model_loader: - kv  10:                   general.base_model.count u32              = 1
llama_model_loader: - kv  11:                  general.base_model.0.name str              = Llama 3.1 Minitron 4B Width Base
llama_model_loader: - kv  12:          general.base_model.0.organization str              = Nvidia
llama_model_loader: - kv  13:              general.base_model.0.repo_url str              = https://huggingface.co/nvidia/Llama-3...
llama_model_loader: - kv  14:                      general.dataset.count u32              = 1
llama_model_loader: - kv  15:                     general.dataset.0.name str              = Llama Nemotron Post Training Dataset
llama_model_loader: - kv  16:             general.dataset.0.organization str              = Nvidia
llama_model_loader: - kv  17:                 general.dataset.0.repo_url str              = https://huggingface.co/nvidia/Llama-N...
llama_model_loader: - kv  18:                               general.tags arr[str,4]       = ["nvidia", "llama-3", "pytorch", "tex...
llama_model_loader: - kv  19:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  20:                          llama.block_count u32              = 32
llama_model_loader: - kv  21:                       llama.context_length u32              = 131072
llama_model_loader: - kv  22:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv  23:                  llama.feed_forward_length u32              = 9216
llama_model_loader: - kv  24:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  25:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  26:                       llama.rope.freq_base f32              = 3565775104.000000
llama_model_loader: - kv  27:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  28:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  29:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  30:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  31:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  32:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  33:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  34:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  35:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2025-06-09T14:08:42.281Z level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  36:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  37:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  38:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {%- if tools %}{{- '<|begin_of_text|>...
llama_model_loader: - kv  40:               general.quantization_version u32              = 2
llama_model_loader: - kv  41:                          general.file_type u32              = 15
llama_model_loader: - type  f32:   66 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 2.58 GiB (4.91 BPW) 
init_tokenizer: initializing tokenizer for type 2
load: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG
load: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG
load: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG
load: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG
load: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG
load: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG
load: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG
load: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG
load: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG
load: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG
load: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG
load: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG
load: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG
load: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG
load: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG
load: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG
load: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG
load: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG
load: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG
load: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG
load: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG
load: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG
load: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG
load: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG
load: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG
load: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG
load: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG
load: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG
load: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG
load: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG
load: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG
load: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG
load: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG
load: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG
load: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG
load: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG
load: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG
load: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG
load: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG
load: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG
load: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG
load: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG
load: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG
load: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG
load: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG
load: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG
load: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG
load: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG
load: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG
load: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG
load: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG
load: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG
load: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG
load: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG
load: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG
load: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG
load: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG
load: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG
load: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG
load: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG
load: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG
load: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG
load: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG
load: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG
load: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG
load: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG
load: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG
load: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG
load: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG
load: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG
load: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG
load: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG
load: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG
load: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG
load: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG
load: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG
load: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG
load: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG
load: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG
load: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG
load: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG
load: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG
load: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG
load: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG
load: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG
load: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG
load: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG
load: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG
load: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG
load: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG
load: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG
load: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG
load: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG
load: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG
load: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG
load: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG
load: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG
load: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG
load: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG
load: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG
load: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG
load: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG
load: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG
load: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG
load: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG
load: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG
load: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG
load: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG
load: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG
load: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG
load: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG
load: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
load: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
load: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG
load: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG
load: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG
load: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG
load: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG
load: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG
load: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG
load: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG
load: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG
load: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG
load: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG
load: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG
load: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG
load: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG
load: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG
load: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG
load: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG
load: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG
load: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG
load: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG
load: control token: 128007 '<|end_header_id|>' is not marked as EOG
load: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG
load: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG
load: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG
load: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG
load: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG
load: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG
load: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG
load: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG
load: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG
load: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG
load: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG
load: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG
load: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG
load: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG
load: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG
load: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG
load: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG
load: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG
load: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG
load: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG
load: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG
load: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
load: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG
load: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG
load: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG
load: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG
load: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG
load: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG
load: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG
load: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG
load: control token: 128001 '<|end_of_text|>' is not marked as EOG
load: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG
load: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG
load: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG
load: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG
load: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG
load: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG
load: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG
load: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG
load: control token: 128006 '<|start_header_id|>' is not marked as EOG
load: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG
load: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG
load: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG
load: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG
load: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG
load: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG
load: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG
load: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG
load: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG
load: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG
load: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG
load: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG
load: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG
load: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG
load: control token: 128000 '<|begin_of_text|>' is not marked as EOG
load: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG
load: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG
load: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG
load: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG
load: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG
load: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG
load: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG
load: control token: 128010 '<|python_tag|>' is not marked as EOG
load: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG
load: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG
load: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG
load: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG
load: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG
load: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG
load: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG
load: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG
load: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG
load: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG
load: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG
load: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG
load: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG
load: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG
load: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG
load: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG
load: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG
load: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG
load: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG
load: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG
load: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG
load: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG
load: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG
load: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG
load: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG
load: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG
load: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG
load: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG
load: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG
load: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG
load: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG
load: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG
load: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG
load: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG
load: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG
load: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG
load: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG
load: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG
load: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG
load: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG
load: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG
load: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG
load: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG
load: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG
load: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG
load: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG
load: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG
load: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG
load: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG
load: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG
load: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG
load: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG
load: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG
load: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG
load: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG
load: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG
load: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 3072
print_info: n_layer          = 32
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 9216
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 3565775104.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 8B
print_info: model params     = 4.51 B
print_info: general.name     = Llama 3.1 Nemotron Nano 4B v1.1
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128009 '<|eot_id|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: layer   0 assigned to device CPU, is_swa = 0
load_tensors: layer   1 assigned to device CPU, is_swa = 0
load_tensors: layer   2 assigned to device CPU, is_swa = 0
load_tensors: layer   3 assigned to device CPU, is_swa = 0
load_tensors: layer   4 assigned to device CPU, is_swa = 0
load_tensors: layer   5 assigned to device CPU, is_swa = 0
load_tensors: layer   6 assigned to device CPU, is_swa = 0
load_tensors: layer   7 assigned to device CPU, is_swa = 0
load_tensors: layer   8 assigned to device CPU, is_swa = 0
load_tensors: layer   9 assigned to device CPU, is_swa = 0
load_tensors: layer  10 assigned to device CPU, is_swa = 0
load_tensors: layer  11 assigned to device CPU, is_swa = 0
load_tensors: layer  12 assigned to device CPU, is_swa = 0
load_tensors: layer  13 assigned to device CPU, is_swa = 0
load_tensors: layer  14 assigned to device CPU, is_swa = 0
load_tensors: layer  15 assigned to device CPU, is_swa = 0
load_tensors: layer  16 assigned to device CPU, is_swa = 0
load_tensors: layer  17 assigned to device CPU, is_swa = 0
load_tensors: layer  18 assigned to device CPU, is_swa = 0
load_tensors: layer  19 assigned to device CPU, is_swa = 0
load_tensors: layer  20 assigned to device CPU, is_swa = 0
load_tensors: layer  21 assigned to device CPU, is_swa = 0
load_tensors: layer  22 assigned to device CPU, is_swa = 0
load_tensors: layer  23 assigned to device CPU, is_swa = 0
load_tensors: layer  24 assigned to device CPU, is_swa = 0
load_tensors: layer  25 assigned to device CPU, is_swa = 0
load_tensors: layer  26 assigned to device CPU, is_swa = 0
load_tensors: layer  27 assigned to device CPU, is_swa = 0
load_tensors: layer  28 assigned to device CPU, is_swa = 0
load_tensors: layer  29 assigned to device CPU, is_swa = 0
load_tensors: layer  30 assigned to device CPU, is_swa = 0
load_tensors: layer  31 assigned to device CPU, is_swa = 0
load_tensors: layer  32 assigned to device CPU, is_swa = 0
load_tensors:          CPU model buffer size =  2642.10 MiB
load_all_data: no device found for buffer type CPU for async uploads
time=2025-06-09T14:08:43.033Z level=DEBUG source=server.go:636 msg="model load progress 0.08"
time=2025-06-09T14:08:43.535Z level=DEBUG source=server.go:636 msg="model load progress 0.20"
time=2025-06-09T14:08:43.785Z level=DEBUG source=server.go:636 msg="model load progress 0.24"
time=2025-06-09T14:08:44.036Z level=DEBUG source=server.go:636 msg="model load progress 0.28"
time=2025-06-09T14:08:44.287Z level=DEBUG source=server.go:636 msg="model load progress 0.31"
time=2025-06-09T14:08:44.538Z level=DEBUG source=server.go:636 msg="model load progress 0.38"
time=2025-06-09T14:08:44.788Z level=DEBUG source=server.go:636 msg="model load progress 0.48"
time=2025-06-09T14:08:45.039Z level=DEBUG source=server.go:636 msg="model load progress 0.58"
time=2025-06-09T14:08:45.290Z level=DEBUG source=server.go:636 msg="model load progress 0.68"
time=2025-06-09T14:08:45.541Z level=DEBUG source=server.go:636 msg="model load progress 0.78"
time=2025-06-09T14:08:45.792Z level=DEBUG source=server.go:636 msg="model load progress 0.88"
time=2025-06-09T14:08:46.042Z level=DEBUG source=server.go:636 msg="model load progress 0.98"
llama_context: constructing llama_context
llama_context: n_seq_max     = 2
llama_context: n_ctx         = 8192
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 1024
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 3565775104.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
set_abort_callback: call
llama_context:        CPU  output buffer size =     1.00 MiB
create_memory: n_ctx = 8192 (padded)
llama_kv_cache_unified: kv_size = 8192, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1, padding = 32
llama_kv_cache_unified: layer   0: dev = CPU
llama_kv_cache_unified: layer   1: dev = CPU
llama_kv_cache_unified: layer   2: dev = CPU
llama_kv_cache_unified: layer   3: dev = CPU
llama_kv_cache_unified: layer   4: dev = CPU
llama_kv_cache_unified: layer   5: dev = CPU
llama_kv_cache_unified: layer   6: dev = CPU
llama_kv_cache_unified: layer   7: dev = CPU
llama_kv_cache_unified: layer   8: dev = CPU
llama_kv_cache_unified: layer   9: dev = CPU
llama_kv_cache_unified: layer  10: dev = CPU
llama_kv_cache_unified: layer  11: dev = CPU
llama_kv_cache_unified: layer  12: dev = CPU
llama_kv_cache_unified: layer  13: dev = CPU
llama_kv_cache_unified: layer  14: dev = CPU
llama_kv_cache_unified: layer  15: dev = CPU
llama_kv_cache_unified: layer  16: dev = CPU
llama_kv_cache_unified: layer  17: dev = CPU
llama_kv_cache_unified: layer  18: dev = CPU
llama_kv_cache_unified: layer  19: dev = CPU
llama_kv_cache_unified: layer  20: dev = CPU
llama_kv_cache_unified: layer  21: dev = CPU
llama_kv_cache_unified: layer  22: dev = CPU
llama_kv_cache_unified: layer  23: dev = CPU
llama_kv_cache_unified: layer  24: dev = CPU
llama_kv_cache_unified: layer  25: dev = CPU
llama_kv_cache_unified: layer  26: dev = CPU
llama_kv_cache_unified: layer  27: dev = CPU
llama_kv_cache_unified: layer  28: dev = CPU
llama_kv_cache_unified: layer  29: dev = CPU
llama_kv_cache_unified: layer  30: dev = CPU
llama_kv_cache_unified: layer  31: dev = CPU
time=2025-06-09T14:08:46.293Z level=DEBUG source=server.go:636 msg="model load progress 1.00"
time=2025-06-09T14:08:46.544Z level=DEBUG source=server.go:639 msg="model load completed, waiting for server to become available" status="llm server loading model"
llama_kv_cache_unified:        CPU KV buffer size =  1024.00 MiB
llama_kv_cache_unified: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 1
llama_context: max_nodes = 65536
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0
llama_context: reserving graph for n_tokens = 512, n_seqs = 1
llama_context: reserving graph for n_tokens = 1, n_seqs = 1
llama_context: reserving graph for n_tokens = 512, n_seqs = 1
llama_context:        CPU compute buffer size =   556.01 MiB
llama_context: graph nodes  = 1094
llama_context: graph splits = 1
time=2025-06-09T14:08:47.046Z level=INFO source=server.go:630 msg="llama runner started in 5.02 seconds"
time=2025-06-09T14:08:47.046Z level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/llama-3-1-nemotron-nano-4b-v1-1-gguf:latest-custom runner.inference=cpu runner.devices=1 runner.size="4.0 GiB" runner.vram="0 B" runner.parallel=2 runner.pid=938 runner.model=/root/.ollama/models/blobs/sha256-6c2e3cd14694a79028e65683420b03ccbd3d7eea1e77448066fe1a8bdc2265aa runner.num_ctx=8192
time=2025-06-09T14:08:47.046Z level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=128 format=""
time=2025-06-09T14:08:47.047Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=18 used=0 remaining=18
[GIN] 2025/06/09 - 14:51:00 | 200 |      51.485µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/06/09 - 14:51:00 | 200 |     792.511µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/06/09 - 14:54:07 | 200 |      44.432µs |       127.0.0.1 | GET      "/api/version"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions