Possible NVFP4 Loading Issue with Qwen3.6-35B-A3B-NVFP4

# Qwen3.6-35B-A3B-NVFP4 Vision model loads with MISSING MoE weights and generates corrupted multilingual output

## 1. Did you update? `pip install --upgrade unsloth unsloth_zoo`

Yes.

Studio version:

```text
v0.1.45-beta
```

Package version:

```text
2026.6.2
```

---

## 2. `Colab` or `Kaggle` or local / cloud

Local Linux server.

---

## 3. Number GPUs used, use `nvidia-smi`

1 GPU

Hardware detected:

```text
NVIDIA GB10
Max memory: 121.69 GB
```

During generation:

```text
GPU Memory Usage: ~65-80GB
GPU Utilization: ~90-95%
```

---

## 4. Which notebook? Please link!

Unsloth Studio local deployment.

Not using a public notebook.

---

## 5. Which Unsloth version, TRL version, transformers version, PyTorch version?

From logs:

```text
Unsloth 2026.6.2
Transformers 5.5.0
Torch 2.10.0+cu130
CUDA Toolkit 13.0
Triton 3.6.0
```

Additional package versions if needed:

```bash
pip show unsloth
pip show unsloth_zoo
pip show transformers
pip show trl
pip show compressed-tensors
python -c "import torch; print(torch.__version__)"
```

---

## 6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc

No training.

Inference only.

---

# Model

```python
model_name = "unsloth/Qwen3.6-35B-A3B-NVFP4"
```

---

# Environment Notes

During startup I see:

```text
Your Flash Attention 2 installation seems to be broken.
Using Xformers instead.
```

and

```text
The fast path is not available because one of the required library is not installed.
Falling back to torch implementation.
```

I understand this may affect performance, but I do not think it explains the corrupted outputs described below.

---

# Model Detection

The model is detected as:

```text
model_type=qwen3_5_moe
architectures=['Qwen3_5MoeForConditionalGeneration']
is_vision=True
```

Therefore Studio correctly identifies it as a Vision model.

---

# Loading Report

After loading weights successfully, I get the following report:

```text
Qwen3_5MoeForConditionalGeneration LOAD REPORT from: unsloth/Qwen3.6-35B-A3B-NVFP4

Key                                                                      | Status
-------------------------------------------------------------------------+-----------
model.layers.{0...39}.mlp.experts.{0...255}.down_proj.input_global_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.{0...255}.up_proj.input_global_scale   | UNEXPECTED
model.layers.{0...39}.mlp.experts.{0...255}.gate_proj.input_global_scale | UNEXPECTED
model.layers.{0...39}.mlp.experts.down_proj_scale                        | UNEXPECTED
model.layers.{0...39}.mlp.experts.down_proj_global_scale                 | UNEXPECTED
model.layers.{0...39}.mlp.experts.down_proj_packed                       | UNEXPECTED
model.layers.{0...39}.mlp.experts.gate_up_proj_packed                    | UNEXPECTED
model.layers.{0...39}.mlp.experts.gate_up_proj_scale                     | UNEXPECTED
model.layers.{0...39}.mlp.experts.gate_up_proj_global_scale              | UNEXPECTED

model.language_model.layers.{0...39}.mlp.experts.gate_up_proj            | MISSING
model.language_model.layers.{0...39}.mlp.experts.down_proj               | MISSING
```

The loader also reports:

```text
MISSING: those params were newly initialized because missing from the checkpoint.
Consider training on your downstream task.
```

---

# Observed Behavior

The model loads successfully:

```text
Successfully loaded model: unsloth/Qwen3.6-35B-A3B-NVFP4
```

and memory usage appears reasonable:

```text
GPU Memory [After loading]
64.53GB / 121.69GB
```

However, generation behavior is abnormal.

---

# Reproduction

Prompt:

```text
你是什么模型？
```

No images attached.

Generation starts normally:

```text
Starting text generation
```

and eventually finishes:

```text
Finished text generation
```

Generation duration:

```text
324.47 seconds
```

---

# Actual Output

The model produces corrupted multilingual text consisting of random fragments from multiple languages:

```text
对她只开始...
tail突然还是个...
stro Collaboration...
organis橱窗...
总书记不愿...
double bouncing...
[blocked]
```

The output is completely unrelated to the prompt.

The response contains:

* Chinese fragments
* English fragments
* Arabic fragments
* Japanese fragments
* Random tokens
* Broken words
* Repeated token patterns

It appears similar to a corrupted decode or partially initialized model rather than a normal language model response.

---

# Why I Suspect a Loading Issue

The model:

* Loads successfully
* Consumes expected GPU memory
* Completes generation

However:

* Generation takes over 5 minutes for a trivial prompt
* Output is nonsensical
* Output is unrelated to the prompt
* Load report shows many MoE expert weights as MISSING

The suspicious part is:

```text
UNEXPECTED:
*_packed
*_scale
*_global_scale

MISSING:
gate_up_proj
down_proj
```

which appear to be core MoE expert projection layers.

This makes me wonder whether:

1. NVFP4 packed expert weights are not being correctly mapped.
2. Some expert layers are being randomly initialized.
3. There is a compatibility issue between:

   * Qwen3.6-35B-A3B-NVFP4
   * Transformers 5.5.0
   * Unsloth Studio 2026.6.2
4. The checkpoint format is not being fully restored during loading.

---

# Questions

1. Are these `UNEXPECTED` and `MISSING` entries expected for this checkpoint?

2. Should the following be present after loading?

```text
gate_up_proj
down_proj
```

3. Are NVFP4 packed tensors:

```text
*_packed
*_scale
*_global_scale
```

supposed to be automatically converted into standard expert projection layers?

4. Could these missing expert projections explain the corrupted multilingual output?

5. Is there a known issue with:

```text
unsloth/Qwen3.6-35B-A3B-NVFP4
```

under:

```text
Unsloth Studio v0.1.45-beta
Unsloth 2026.6.2
Transformers 5.5.0
```

6. Is this load report expected, or does it indicate an architecture / checkpoint compatibility problem?

Thanks for your help.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible NVFP4 Loading Issue with Qwen3.6-35B-A3B-NVFP4 #6224

Qwen3.6-35B-A3B-NVFP4 Vision model loads with MISSING MoE weights and generates corrupted multilingual output

1. Did you update? `pip install --upgrade unsloth unsloth_zoo`

2. `Colab` or `Kaggle` or local / cloud

3. Number GPUs used, use `nvidia-smi`

4. Which notebook? Please link!

5. Which Unsloth version, TRL version, transformers version, PyTorch version?

6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc

Model

Environment Notes

Model Detection

Loading Report

Observed Behavior

Reproduction

Actual Output

Why I Suspect a Loading Issue

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Possible NVFP4 Loading Issue with Qwen3.6-35B-A3B-NVFP4 #6224

Description

Qwen3.6-35B-A3B-NVFP4 Vision model loads with MISSING MoE weights and generates corrupted multilingual output

1. Did you update? pip install --upgrade unsloth unsloth_zoo

2. Colab or Kaggle or local / cloud

3. Number GPUs used, use nvidia-smi

4. Which notebook? Please link!

5. Which Unsloth version, TRL version, transformers version, PyTorch version?

6. Which trainer? SFTTrainer, GRPOTrainer etc

Model

Environment Notes

Model Detection

Loading Report

Observed Behavior

Reproduction

Actual Output

Why I Suspect a Loading Issue

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Did you update? `pip install --upgrade unsloth unsloth_zoo`

2. `Colab` or `Kaggle` or local / cloud

3. Number GPUs used, use `nvidia-smi`

6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc