Regression in Gemma-3n model quantization support

Summary

- Re-quantizing Gemma-3n-E2B-it to W4A16 with the same recipe that previously worked now yields models with random / very poor accuracy on OpenLLM benchmarks. Suspected to affect both E2B and E4B; only E2B reproduced so far.
- The regression localizes to `self_attn.o_proj.weight_packed` for layers ≥ 20: the bad model’s packed weights collapse to a constant `0x88` pattern; other packed tensors (q/k/v, MLP) look normal/close.

Suspected cause
- Recent changes in Transformers Gemma-3n model definition / autowrap (module mapping).

Diff of the two models obtained with same recipe:
A - older run, good model
B - newer run with same recipe, model produces random accuracies
```
model.language_model.layers.20.self_attn.o_proj.weight_packed
  A: model-00001-of-00003.safetensors
     shape=(2048, 256) dtype=torch.int32 bytes=2097152
     u8_mean=136.000 u8_std=0.000 sum32=285212672 l2=1.969e+05
     hex[:16]=88888888888888888888888888888888
  B: model-00001-of-00003.safetensors
     shape=(2048, 256) dtype=torch.int32 bytes=2097152
     u8_mean=135.991 u8_std=43.678 sum32=285193004 l2=2.068e+05
     hex[:16]=46885797b66863cca9685cb3637b3a77
```
(Other tensors like q_proj, mlp.up/gate/down_proj are statistically similar across A/B.)

Model A: https://huggingface.co/RedHatAI/gemma-3n-E2B-it-quantized.w4a16
Model B: local model, can share if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression in Gemma-3n model quantization support #1765

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regression in Gemma-3n model quantization support #1765

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions