-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
Model customization<NV>Adding support for new model architectures or variants<NV>Adding support for new model architectures or variantsTesting<NV>Continuous integration, build system, and testing infrastructure issues<NV>Continuous integration, build system, and testing infrastructure issuesbugSomething isn't workingSomething isn't workingstaletriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
System Info
- CPU: x86_64
- GPU: A10 (24G)
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I execute convert script in docker:
nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3
command:
python3 convert_checkpoint.py --model_dir <Qwen2-1.5B-Instruct_PATH> --output_dir <Qwen2-1.5B-Instruct_PATH>/tllm --dtype float16
model file:
exception stack:
[10/29/2024-17:08:19] [TRT-LLM] [W] Found pynvml==11.5.3 and cuda driver version 470.182.03. Please use pynvml>=11.5.0 and cuda driver>=526 to get accurate memory usage.
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
229it [00:02, 93.96it/s]
Traceback (most recent call last):
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in <module>
main()
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
convert_and_save_hf(args)
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
f(args, rank)
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 427, in from_hugging_face
loader.generate_tllm_weights(model)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 357, in generate_tllm_weights
self.load(tllm_key,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 278, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 391, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.__del__ at 0x7f778f992050>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 453, in __del__
self.release()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 450, in release
release_gc()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 471, in release_gc
torch.cuda.ipc_collect()
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 901, in ipc_collect
_lazy_init()
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 330, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
Expected behavior
convert success
actual behavior
convert failed
additional notes
tensorRT-llm 0.13.0
Metadata
Metadata
Assignees
Labels
Model customization<NV>Adding support for new model architectures or variants<NV>Adding support for new model architectures or variantsTesting<NV>Continuous integration, build system, and testing infrastructure issues<NV>Continuous integration, build system, and testing infrastructure issuesbugSomething isn't workingSomething isn't workingstaletriagedIssue has been triaged by maintainersIssue has been triaged by maintainers