Description
Name and Version
Ask mradermacher or unsloth
Operating systems
Linux
GGML backends
CUDA
Hardware
Ask mradermacher or unsloth
Models
Llama-4-Maverick-17B-128E-Instruct-IQ2_M
Problem description & steps to reproduce
IQ2_M is broken for Llama-4-Maverick-17B-128E-Instruct-GGUF
https://hf.tst.eu/status.html
nico1 nice size (static/imatrix) -- jobs 9/8-40 maxm 130 free 2815 budget 1523 uploads 95 hfd 585 32c
-7776 804 I Llama-4-Maverick-17B-128E-Instruct error/47 8/24,IQ2_M [20/531]
-3999 804 I Llama-4-Maverick-17B-128E error/47 8/24,IQ2_M [20/531]
Unsloth noticed this same issue in their report:
https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-llama-4
Interesting Insights and Issues
During quantization of Llama 4 Maverick (the large model), we found the 1st, 3rd and 45th MoE layers could not be calibrated correctly. Maverick uses interleaving MoE layers for every odd layer, so Dense->MoE->Dense and so on.
Apparently the Scout 16E model can be successfully quantized to IQ2_M. Just that the Maverick 128E can't. Can we fix that? Because IQ2M should be in the ~135GB in size, Unsloth had to use 3bits and 4bits for those layers that couldn't quantize, blowing up the model to 4x ~40GB in size. And mradermacher Q2K is almost 157GB.
Would be very nice to compress another 20GB off using IQ2_M.
First Bad Commit
No response
Relevant log output
nico1 nice size (static/imatrix) -- [jobs](https://huggingface.co/mradermacher/jobs-GGUF) 9/8-40 [maxm](https://huggingface.co/mradermacher/maxm-GGUF) 130 free 2815 budget 1523 uploads 95 hfd 585 32c
-7776 804 I Llama-4-Maverick-17B-128E-Instruct error/47 8/24,IQ2_M [20/531]
-3999 804 I Llama-4-Maverick-17B-128E error/47 8/24,IQ2_M [20/531]