Closed
Description
Name and Version
Bug only specific to Python code. Not C/C++ code.
$ git rev-parse HEAD
fe5b78c
Operating systems
Linux
GGML backends
CPU, BLAS
Hardware
IBM z15 8 IFLs / 64 GB RAIM / 160 GB + 500 GB DASD / NOSMT / LPAR
Models
IBM Granite Vision 3.2 2B F16 (mmproj-model-f16.gguf)
Problem description & steps to reproduce
The Problem
Using the following machines for this test:
- MacBook Air M3 (Little-Endian byte-order)
- IBM z15 Mainframe (Big-Endian byte-order)
Steps to reproduce:
- On both machines, pull the latest code and follow the (README-granitevision.md)[https://github.com/ggml-org/llama.cpp/blob/master/examples/llava/README-granitevision.md] instructions.
- On both machines, create the
mmproj-model-f16.gguf
file using the following command
python3 /opt/llama-testbed/examples/llava/convert_image_encoder_to_gguf.py \
-m $ENCODER_PATH/ \
--llava-projector $ENCODER_PATH/llava.projector \
--output-dir $ENCODER_PATH/ \
--clip-model-is-vision \
--clip-model-is-siglip \
--image-mean 0.5 0.5 0.5 \
--image-std 0.5 0.5 0.5 \
--bigendian
- Try using the
mmproj-model-f16.gguf
file generated by both machines, on a Big-Endian machine. Notice that themmproj-model-f16.gguf
generated by the Little-Endian machine works on Big-Endian. But themmproj-model-f16.gguf
generated by the Big-Endian machine does not work on Big-Endian.
build/bin/llama-llava-cli -m /opt/hf_models/granite-vision-3.2-2b.F16.gguf \
--mmproj $ENCODER_PATH/mmproj-model-f16.gguf \
--image /opt/llama-testbed/DEMO-TAX-INVOICE-PNG.png \
-c 16384 \
-p "<|system|>\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n<|user|>\n\<image>\nWhat does the text in this image say?\n<|assistant|>\n" \
--temp 0
Problem Identified
- Running a vimdiff against both
mmproj-model-f16.gguf
, I notice that thev.head.ffn_up.bias
tensor is not byte-swapped correctly on Big-Endian systems, but works correctly on Little-Endian systems.
for i in {0..36176..560}; do vimdiff <(xxd -s$i -l560 mmproj-model-f16-le2be.gguf) <(xxd -s$i -l560 mmproj-model-f16.gguf); done
vimdiff
shows that onlyv.head.ffn_up.bias
is not byteswapped correctly. (Left pane shows the correct byteswap; Right pane shows the incorrect byteswap)

Running gguf_dump.py
also shows that v.head.ffn_up.bias
is the last tensor from the model file.
First Bad Commit
NIL
Relevant log output
python3 gguf_dump.py ~/Documents/hf_models/granite-vision-3.2-2b/visual_encoder/mmproj-model-f16-le2be.gguf
INFO:gguf-dump:* Loading: /Users/taronaeo/Documents/hf_models/granite-vision-3.2-2b/visual_encoder/mmproj-model-f16-le2be.gguf
* File is BIG endian, script is running on a LITTLE endian host.
* Dumping 25 key/value pair(s)
1: UINT32 | 1 | GGUF.version = 3
2: UINT64 | 1 | GGUF.tensor_count = 451
3: UINT64 | 1 | GGUF.kv_count = 22
4: STRING | 1 | general.architecture = 'clip'
5: BOOL | 1 | clip.has_text_encoder = False
6: BOOL | 1 | clip.has_vision_encoder = True
7: BOOL | 1 | clip.has_llava_projector = True
8: UINT32 | 1 | general.file_type = 1
9: STRING | 1 | general.name = 'siglip-model'
10: STRING | 1 | general.description = 'image encoder for LLaVA'
11: STRING | 1 | clip.projector_type = 'mlp'
12: UINT32 | 1 | clip.vision.image_size = 384
13: UINT32 | 1 | clip.vision.patch_size = 14
14: UINT32 | 1 | clip.vision.embedding_length = 1152
15: UINT32 | 1 | clip.vision.feed_forward_length = 4304
16: UINT32 | 1 | clip.vision.projection_dim = 0
17: UINT32 | 1 | clip.vision.attention.head_count = 16
18: FLOAT32 | 1 | clip.vision.attention.layer_norm_epsilon = 9.999999974752427e-07
19: UINT32 | 1 | clip.vision.block_count = 27
20: [INT32] | 54 | clip.vision.image_grid_pinpoints = [384, 384, 384, 768, 384, 1152, ...]
21: STRING | 1 | clip.vision.mm_patch_merge_type = 'spatial_unpad'
22: [INT32] | 4 | clip.vision.feature_layer = [4, 8, 16, 27]
23: [FLOAT32] | 3 | clip.vision.image_mean = [0.5, 0.5, 0.5]
24: [FLOAT32] | 3 | clip.vision.image_std = [0.5, 0.5, 0.5]
25: BOOL | 1 | clip.use_gelu = False
* Dumping 451 tensor(s)
1: 2048 | 2048, 1, 1, 1 | F32 | mm.0.bias
...truncated...
451: 1152 | 1152, 1, 1, 1 | F32 | v.head.ffn_up.bias