Qwen3.6-35B-A3B-NVFP4 Vision model loads with MISSING MoE weights and generates corrupted multilingual output
1. Did you update? pip install --upgrade unsloth unsloth_zoo
Yes.
Studio version:
Package version:
2. Colab or Kaggle or local / cloud
Local Linux server.
3. Number GPUs used, use nvidia-smi
1 GPU
Hardware detected:
NVIDIA GB10
Max memory: 121.69 GB
During generation:
GPU Memory Usage: ~65-80GB
GPU Utilization: ~90-95%
4. Which notebook? Please link!
Unsloth Studio local deployment.
Not using a public notebook.
5. Which Unsloth version, TRL version, transformers version, PyTorch version?
From logs:
Unsloth 2026.6.2
Transformers 5.5.0
Torch 2.10.0+cu130
CUDA Toolkit 13.0
Triton 3.6.0
Additional package versions if needed:
pip show unsloth
pip show unsloth_zoo
pip show transformers
pip show trl
pip show compressed-tensors
python -c "import torch; print(torch.__version__)"
6. Which trainer? SFTTrainer, GRPOTrainer etc
No training.
Inference only.
Model
model_name = "unsloth/Qwen3.6-35B-A3B-NVFP4"
Environment Notes
During startup I see:
Your Flash Attention 2 installation seems to be broken.
Using Xformers instead.
and
The fast path is not available because one of the required library is not installed.
Falling back to torch implementation.
I understand this may affect performance, but I do not think it explains the corrupted outputs described below.
Model Detection
The model is detected as:
model_type=qwen3_5_moe
architectures=['Qwen3_5MoeForConditionalGeneration']
is_vision=True
Therefore Studio correctly identifies it as a Vision model.
Loading Report
After loading weights successfully, I get the following report:
Qwen3_5MoeForConditionalGeneration LOAD REPORT from: unsloth/Qwen3.6-35B-A3B-NVFP4
Key | Status
-------------------------------------------------------------------------+-----------
model.layers.{0...39}.mlp.experts.{0...255}.down_proj.input_global_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.{0...255}.up_proj.input_global_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.{0...255}.gate_proj.input_global_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.down_proj_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.down_proj_global_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.down_proj_packed | UNEXPECTED
model.layers.{0...39}.mlp.experts.gate_up_proj_packed | UNEXPECTED
model.layers.{0...39}.mlp.experts.gate_up_proj_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.gate_up_proj_global_scale | UNEXPECTED
model.language_model.layers.{0...39}.mlp.experts.gate_up_proj | MISSING
model.language_model.layers.{0...39}.mlp.experts.down_proj | MISSING
The loader also reports:
MISSING: those params were newly initialized because missing from the checkpoint.
Consider training on your downstream task.
Observed Behavior
The model loads successfully:
Successfully loaded model: unsloth/Qwen3.6-35B-A3B-NVFP4
and memory usage appears reasonable:
GPU Memory [After loading]
64.53GB / 121.69GB
However, generation behavior is abnormal.
Reproduction
Prompt:
No images attached.
Generation starts normally:
and eventually finishes:
Generation duration:
Actual Output
The model produces corrupted multilingual text consisting of random fragments from multiple languages:
对她只开始...
tail突然还是个...
stro Collaboration...
organis橱窗...
总书记不愿...
double bouncing...
[blocked]
The output is completely unrelated to the prompt.
The response contains:
- Chinese fragments
- English fragments
- Arabic fragments
- Japanese fragments
- Random tokens
- Broken words
- Repeated token patterns
It appears similar to a corrupted decode or partially initialized model rather than a normal language model response.
Why I Suspect a Loading Issue
The model:
- Loads successfully
- Consumes expected GPU memory
- Completes generation
However:
- Generation takes over 5 minutes for a trivial prompt
- Output is nonsensical
- Output is unrelated to the prompt
- Load report shows many MoE expert weights as MISSING
The suspicious part is:
UNEXPECTED:
*_packed
*_scale
*_global_scale
MISSING:
gate_up_proj
down_proj
which appear to be core MoE expert projection layers.
This makes me wonder whether:
-
NVFP4 packed expert weights are not being correctly mapped.
-
Some expert layers are being randomly initialized.
-
There is a compatibility issue between:
- Qwen3.6-35B-A3B-NVFP4
- Transformers 5.5.0
- Unsloth Studio 2026.6.2
-
The checkpoint format is not being fully restored during loading.
Questions
-
Are these UNEXPECTED and MISSING entries expected for this checkpoint?
-
Should the following be present after loading?
- Are NVFP4 packed tensors:
*_packed
*_scale
*_global_scale
supposed to be automatically converted into standard expert projection layers?
-
Could these missing expert projections explain the corrupted multilingual output?
-
Is there a known issue with:
unsloth/Qwen3.6-35B-A3B-NVFP4
under:
Unsloth Studio v0.1.45-beta
Unsloth 2026.6.2
Transformers 5.5.0
- Is this load report expected, or does it indicate an architecture / checkpoint compatibility problem?
Thanks for your help.
Qwen3.6-35B-A3B-NVFP4 Vision model loads with MISSING MoE weights and generates corrupted multilingual output
1. Did you update?
pip install --upgrade unsloth unsloth_zooYes.
Studio version:
Package version:
2.
ColaborKaggleor local / cloudLocal Linux server.
3. Number GPUs used, use
nvidia-smi1 GPU
Hardware detected:
During generation:
4. Which notebook? Please link!
Unsloth Studio local deployment.
Not using a public notebook.
5. Which Unsloth version, TRL version, transformers version, PyTorch version?
From logs:
Additional package versions if needed:
pip show unsloth pip show unsloth_zoo pip show transformers pip show trl pip show compressed-tensors python -c "import torch; print(torch.__version__)"6. Which trainer?
SFTTrainer,GRPOTraineretcNo training.
Inference only.
Model
Environment Notes
During startup I see:
and
I understand this may affect performance, but I do not think it explains the corrupted outputs described below.
Model Detection
The model is detected as:
Therefore Studio correctly identifies it as a Vision model.
Loading Report
After loading weights successfully, I get the following report:
The loader also reports:
Observed Behavior
The model loads successfully:
and memory usage appears reasonable:
However, generation behavior is abnormal.
Reproduction
Prompt:
No images attached.
Generation starts normally:
and eventually finishes:
Generation duration:
Actual Output
The model produces corrupted multilingual text consisting of random fragments from multiple languages:
The output is completely unrelated to the prompt.
The response contains:
It appears similar to a corrupted decode or partially initialized model rather than a normal language model response.
Why I Suspect a Loading Issue
The model:
However:
The suspicious part is:
which appear to be core MoE expert projection layers.
This makes me wonder whether:
NVFP4 packed expert weights are not being correctly mapped.
Some expert layers are being randomly initialized.
There is a compatibility issue between:
The checkpoint format is not being fully restored during loading.
Questions
Are these
UNEXPECTEDandMISSINGentries expected for this checkpoint?Should the following be present after loading?
supposed to be automatically converted into standard expert projection layers?
Could these missing expert projections explain the corrupted multilingual output?
Is there a known issue with:
under:
Thanks for your help.