Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

Satrat
Copy link

@Satrat Satrat commented Dec 1, 2023

Issue first noticed on teknium/OpenHermes-2.5-Mistral-7B. When calculating the activation scales, its possible to get a scale of 0, which causes a NaN weight that errors out when trying to run the forward pass during quantization calibration.

The fix is to set a minimum scale of 1e-5 to avoid a divide by 0.

Also adding a seqlen argument to the OBCQ script, using the max sequence length Mistral was running out of memory during the perplexity eval.

See slack thread for more info on bug: https://neuralmagic.slack.com/archives/C04SRPGT5MW/p1700515011493959

Testing

src/sparseml/transformers/sparsification/obcq/obcq.py teknium/OpenHermes-2.5-Mistral-7B open_platypus --recipe recipe_mistral.yaml --precision float16 --seqlen 512 --eval wikitext2

Runs to completion now, previously failed with:

assert min_val <= max_val, "min {} should be less than max {}".format(
AssertionError: min nan should be less than max nan

recipe_mistral.yaml

test_stage:
  obcq_modifiers:
    LogarithmicEqualizationModifier:
      mappings: [
        [["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"],
        [["re:.*gate_proj", "re:.*up_proj"], "re:.*post_attention_layernorm"]
      ]
    QuantizationModifier:
      ignore:
        # These operations don't make sense to quantize
        - MistralRotaryEmbedding
        - MistralRMSNorm
        - SiLUActivation
        # Skip quantizing the BMMs
        # - QuantizableMatMul
        # Skip quantizing the layers with the most sensitive activations
        - model.layers.1.mlp.down_proj
        - model.layers.31.mlp.down_proj
        - model.layers.30.mlp.down_proj
        - model.layers.30.mlp.gate_proj
        - model.layers.30.mlp.up_proj
      post_oneshot_calibration: true
      scheme_overrides:
        Embedding:
          input_activations: null
          weights:
            num_bits: 8
            symmetric: false
    SparseGPTModifier:
      sparsity: 0.5
      block_size: 128
      sequential_update: true
      quantize: true
      percdamp: 0.01
      mask_structure: "0:0"
      targets: ["re:model.layers.\\d*$"]

Perplexity results:

2023-12-01 16:07:27 sparseml.modifiers.obcq.utils.helpers INFO     Evaluating perplexity...
2023-12-01 16:07:34 sparseml.modifiers.obcq.utils.helpers INFO     tensor(16.5364, device='cuda:4')
2023-12-01 16:07:41 sparseml.modifiers.obcq.utils.helpers INFO     tensor(19.9614, device='cuda:4')
2023-12-01 16:07:49 sparseml.modifiers.obcq.utils.helpers INFO     tensor(17.2977, device='cuda:4')
2023-12-01 16:07:56 sparseml.modifiers.obcq.utils.helpers INFO     tensor(14.8696, device='cuda:4')
2023-12-01 16:08:04 sparseml.modifiers.obcq.utils.helpers INFO     tensor(15.0391, device='cuda:4')
2023-12-01 16:08:11 sparseml.modifiers.obcq.utils.helpers INFO     tensor(15.0188, device='cuda:4')

@Satrat Satrat marked this pull request as ready for review December 1, 2023 15:05
@Satrat Satrat requested a review from anmarques December 1, 2023 16:15
@mgoin mgoin merged commit c722fc3 into main Dec 1, 2023
@mgoin mgoin deleted the smooth_nan_fix branch December 1, 2023 18:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants