Qwen2-1.5B-Instruct convert_checkpoint.py failed

### System Info

- CPU: x86_64
- GPU: A10 (24G)


### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I execute convert script in docker: 

> nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3

command:

>  python3 convert_checkpoint.py --model_dir <Qwen2-1.5B-Instruct_PATH> --output_dir   <Qwen2-1.5B-Instruct_PATH>/tllm --dtype float16 

model file:
> https://huggingface.co/Qwen/Qwen2-1.5B-Instruct


exception stack:
```
[10/29/2024-17:08:19] [TRT-LLM] [W] Found pynvml==11.5.3 and cuda driver version 470.182.03. Please use pynvml>=11.5.0 and cuda driver>=526 to get accurate memory usage.
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
229it [00:02, 93.96it/s] 
Traceback (most recent call last):
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in <module>
    main()
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
    convert_and_save_hf(args)
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
    f(args, rank)
  File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
    qwen = QWenForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 427, in from_hugging_face
    loader.generate_tllm_weights(model)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 357, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 278, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 391, in postprocess
    weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.__del__ at 0x7f778f992050>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 453, in __del__
    self.release()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 450, in release
    release_gc()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 471, in release_gc
    torch.cuda.ipc_collect()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 901, in ipc_collect
    _lazy_init()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 330, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
```

### Expected behavior

convert success

### actual behavior

convert failed

### additional notes

tensorRT-llm 0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2-1.5B-Instruct convert_checkpoint.py failed #2388

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen2-1.5B-Instruct convert_checkpoint.py failed #2388

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions