[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat

### System Info

GPU NVIDIA L20

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I am trying to quantize CodeQwen1.5 7B Chat to FP8 using a modified version of the example quantization script:
```
python quantization/quantize.py --model_dir /mnt/models/CodeQwen1.5-7B-Chat \
                                --dtype float16 \
                                --qformat fp8 \
                                --kv_cache_dtype fp8 \
                                --output_dir /mnt/trt_models/codeqwen1.5_7b_checkpoint_1gpu_fp8_fp8kv \
                                --calib_size 512 \
                                --calib_dataset /mnt/dataset/cnn_dailymail
```


### Expected behavior

The outside quantize.py will use `quantize_and_export()` to run quantization, and it is defined inside https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/quantization/quantize_by_modelopt.py

`get_tokenizer` will automatically read the tokenizer from my `model_dir` and set the `pad_token` as well as the `eos_token`.


### actual behavior

But it failed to set the pad_token:
```
[07/16/2024-13:46:30] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
[07/16/2024-13:46:30] [TRT-LLM] [I] Starting TensorRT-LLM init.
[TensorRT-LLM][INFO] Set logger level by INFO
[07/16/2024-13:46:30] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Initializing model from /mnt/models/CodeQwen1.5-7B-Chat
[07/16/2024-13:47:14] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:28<00:00,  7.20s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.bfloat16.
Initializing tokenizer from /mnt/models/CodeQwen1.5-7B-Chat
Traceback (most recent call last):
  File "quantization/quantize.py", line 90, in <module>
    quantize_and_export(
  File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 289, in quantize_and_export
    tokenizer = get_tokenizer(model_dir,
  File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 147, in get_tokenizer
    assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"
AssertionError: Pad token for qwen cannot be set!
```

### additional notes

I commented out some lines except for the `AutoTokenizer.from_pretrained()` to get this case worked.
```python
def get_tokenizer(ckpt_path, max_seq_length=2048, model_type=None):
    print(f"Initializing tokenizer from {ckpt_path}")
    tokenizer = AutoTokenizer.from_pretrained(
        ckpt_path,
        model_max_length=max_seq_length,
        padding_side="left",
        trust_remote_code=True,
    )
    # if model_type and model_type == "qwen":
    #     # qwen use token id 151643 as pad and eos tokens
    #     tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
    #     tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)

    # # can't set attribute 'pad_token' for "<unk>"
    # if tokenizer.pad_token != "<unk>":  # nosec B105
    #     tokenizer.pad_token = tokenizer.eos_token
    # if tokenizer.pad_token is None:
    #     tokenizer.pad_token = tokenizer.eos_token
    # assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"

    return tokenizer
```

I know that commenting out these lines will certainly affect other model's conversion. It seems there needs to be a fix on this function to support CodeQwen1.5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat #1953

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions