convert qwen2-0.5b-instruct failed when using smoothquant

### System Info

GPU Type: A6000

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

git clone https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
python3 ./convert_checkpoint.py --model_dir ./Qwen2-0.5B-Instruct --output_dir ./tllm_checkpoint_1gpu_sq --dtype float16 --smoothquant 0.5

### Expected behavior

Successfully convert and save model checkpoints

### actual behavior

```
Cloning into 'Qwen2-0.5B-Instruct'...
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 33 (delta 12), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (33/33), 3.60 MiB | 6.54 MiB/s, done.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024073000
0.12.0.dev2024073000
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for ccdv/cnn_dailymail contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ccdv/cnn_dailymail
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
calibrating model:   0%|                                                                                                                              | 0/512 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
calibrating model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:20<00:00, 24.44it/s]
Weights loaded. Total time: 00:00:18
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 309, in <module>
    main()
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 301, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/qwen/./convert_checkpoint.py", line 228, in convert_and_save_hf
    QWenForCausalLM.quantize(args.model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 380, in quantize
    convert.quantize(hf_model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1207, in quantize
    safetensors.torch.save_file(
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 284, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 480, in _flatten
    raise RuntimeError(
RuntimeError:
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.vocab_embedding.weight', 'lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors
```

### additional notes

transformer version: 4.42.4
TensorRT-LLM version: "0.12.0.dev2024073000"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert qwen2-0.5b-instruct failed when using smoothquant #2087

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

convert qwen2-0.5b-instruct failed when using smoothquant #2087

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions