-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
- NVIDIA H100 DGX
- CUDA 12.1
- TensorRT-LLM 0.8.0
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Based on the Falcon examples, I added the use of pipeline parallelism and gather_all_token_logits:
python convert_checkpoint.py --model_dir ./falcon/7b-instruct --dtype bfloat16 --output_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --pp_size 2
trtllm-build --checkpoint_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --gemm_plugin bfloat16 --remove_input_padding enable --gpt_attention_plugin bfloat16 --output_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/ --gather_all_token_logits
python ../summarize.py --test_trt_llm --hf_model_dir ./falcon/7b-instruct --engine_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/
Expected behavior
Produces a similar result to the case without pipelining and without gather_all_token_logits
actual behavior
Crashes with the following stack trace:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x8
[ 0] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fa73ade1520]
[ 1] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession18executeContextStepERKSt6vectorINS0_15GenerationInputESaIS3_EERKS2_IiSaIiEEPKNS_13batch_manager16kv_cache_manager14KVCacheManagerE+0x5a2)[0x7fa455c9a7c2]
[ 2] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession15generateBatchedERSt6vectorINS0_16GenerationOutputESaIS3_EERKS2_INS0_15GenerationInputESaIS7_EERKNS0_14SamplingConfigERKSt8functionIFvibEE+0xc0b)[0x7fa455c9b89b]
[ 3] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession8generateERNS0_16GenerationOutputERKNS0_15GenerationInputERKNS0_14SamplingConfigE+0xc43)[0x7fa455c9d2f3]
[ 4] /virtualenv/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x42f79)[0x7fa484d80f79]
[ 5] /virtualenv/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d19e)[0x7fa484d6b19e]
[ 6] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x15a10e)[0x55cc0703e10e]
[ 7] python(_PyObject_MakeTpCall+0x25b)[0x55cc07034a7b]
[ 8] python(+0x168acb)[0x55cc0704cacb]
[ 9] python(_PyEval_EvalFrameDefault+0x614a)[0x55cc0702ccfa]
[10] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x1687f1)[0x55cc0704c7f1]
[11] python(PyObject_Call+0x122)[0x55cc0704d492]
[12] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyEval_EvalFrameDefault+0x2a27)[0x55cc070295d7]
[13] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyFunction_Vectorcall+0x7c)[0x55cc0703e9fc]
[14] python(_PyEval_EvalFrameDefault+0x198c)[0x55cc0702853c]
[15] python(_PyFunction_Vectorcall+0x7c)[0x55cc0703e9fc]
Tue Mar 12 12:30:27 2024[1,0]<stderr>:[16] python(_PyEval_EvalFrameDefault+0x6bd)[0x55cc0702726d]
[17] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x13f9c6)[0x55cc070239c6]
[18] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(PyEval_EvalCode+0x86)[0x55cc07119256]
[19] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x260108)[0x55cc07144108]
[20] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x2599cb)[0x55cc0713d9cb]
[21] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x25fe55)[0x55cc07143e55]
[22] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyRun_SimpleFileObject+0x1a8)[0x55cc07143338]
[23] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyRun_AnyFileObject+0x43)[0x55cc07142f83]
[24] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(Py_RunMain+0x2be)[0x55cc07135a5e]
[25] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(Py_BytesMain+0x2d)[0x55cc0710c02d]
[26] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fa73adc8d90]
[27] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fa73adc8e40]
[28] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_start+0x25)[0x55cc0710bf25]
*** End of error message ***
If I add --use_py_session, I get the following error:
Traceback (most recent call last):
File "/TensorRT-LLM/examples/falcon/../summarize.py", line 644, in <module>
main(args)
File "/TensorRT-LLM/examples/falcon/../summarize.py", line 388, in main
output, *_ = eval_trt_llm(datapoint,
File "/TensorRT-LLM/examples/falcon/../summarize.py", line 233, in eval_trt_llm
outputs = runner.generate(
File "/virtualenv/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 642, in generate
outputs = self._prepare_outputs(outputs, input_lengths)
File "/virtualenv/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 237, in _prepare_outputs
context_logits = context_logits.flatten(end_dim=-2)
AttributeError: 'NoneType' object has no attribute 'flatten'
additional notes
We noticed this error in different tasks that require us to gather logits and use pipeline parallelism. We managed to reproduce this issue based on the official examples. For simplicity, I base this issue description on these observations.
fjosw
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working