Skip to content

Segmentation fault with pipeline parallelism and gather_all_token_logits #1284

@Marks101

Description

@Marks101

System Info

  • NVIDIA H100 DGX
  • CUDA 12.1
  • TensorRT-LLM 0.8.0

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Based on the Falcon examples, I added the use of pipeline parallelism and gather_all_token_logits:

python convert_checkpoint.py --model_dir ./falcon/7b-instruct --dtype bfloat16 --output_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --pp_size 2

trtllm-build --checkpoint_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --gemm_plugin bfloat16 --remove_input_padding enable --gpt_attention_plugin bfloat16 --output_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/ --gather_all_token_logits

python ../summarize.py --test_trt_llm --hf_model_dir ./falcon/7b-instruct --engine_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/

Expected behavior

Produces a similar result to the case without pipelining and without gather_all_token_logits

actual behavior

Crashes with the following stack trace:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x8
[ 0] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fa73ade1520]
[ 1] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession18executeContextStepERKSt6vectorINS0_15GenerationInputESaIS3_EERKS2_IiSaIiEEPKNS_13batch_manager16kv_cache_manager14KVCacheManagerE+0x5a2)[0x7fa455c9a7c2]
[ 2] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession15generateBatchedERSt6vectorINS0_16GenerationOutputESaIS3_EERKS2_INS0_15GenerationInputESaIS7_EERKNS0_14SamplingConfigERKSt8functionIFvibEE+0xc0b)[0x7fa455c9b89b]
[ 3] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession8generateERNS0_16GenerationOutputERKNS0_15GenerationInputERKNS0_14SamplingConfigE+0xc43)[0x7fa455c9d2f3]
[ 4] /virtualenv/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x42f79)[0x7fa484d80f79]
[ 5] /virtualenv/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d19e)[0x7fa484d6b19e]
[ 6] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x15a10e)[0x55cc0703e10e]
[ 7] python(_PyObject_MakeTpCall+0x25b)[0x55cc07034a7b]
[ 8] python(+0x168acb)[0x55cc0704cacb]
[ 9] python(_PyEval_EvalFrameDefault+0x614a)[0x55cc0702ccfa]
[10] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x1687f1)[0x55cc0704c7f1]
[11] python(PyObject_Call+0x122)[0x55cc0704d492]
[12] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyEval_EvalFrameDefault+0x2a27)[0x55cc070295d7]
[13] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyFunction_Vectorcall+0x7c)[0x55cc0703e9fc]
[14] python(_PyEval_EvalFrameDefault+0x198c)[0x55cc0702853c]
[15] python(_PyFunction_Vectorcall+0x7c)[0x55cc0703e9fc]
Tue Mar 12 12:30:27 2024[1,0]<stderr>:[16] python(_PyEval_EvalFrameDefault+0x6bd)[0x55cc0702726d]
[17] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x13f9c6)[0x55cc070239c6]
[18] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(PyEval_EvalCode+0x86)[0x55cc07119256]
[19] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x260108)[0x55cc07144108]
[20] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x2599cb)[0x55cc0713d9cb]
[21] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x25fe55)[0x55cc07143e55]
[22] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyRun_SimpleFileObject+0x1a8)[0x55cc07143338]
[23] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyRun_AnyFileObject+0x43)[0x55cc07142f83]
[24] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(Py_RunMain+0x2be)[0x55cc07135a5e]
[25] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(Py_BytesMain+0x2d)[0x55cc0710c02d]
[26] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fa73adc8d90]
[27] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fa73adc8e40]
[28] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_start+0x25)[0x55cc0710bf25]
*** End of error message ***

If I add --use_py_session, I get the following error:

Traceback (most recent call last):
  File "/TensorRT-LLM/examples/falcon/../summarize.py", line 644, in <module>
    main(args)
  File "/TensorRT-LLM/examples/falcon/../summarize.py", line 388, in main
    output, *_ = eval_trt_llm(datapoint,
  File "/TensorRT-LLM/examples/falcon/../summarize.py", line 233, in eval_trt_llm
    outputs = runner.generate(
  File "/virtualenv/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 642, in generate
    outputs = self._prepare_outputs(outputs, input_lengths)
  File "/virtualenv/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 237, in _prepare_outputs
    context_logits = context_logits.flatten(end_dim=-2)
AttributeError: 'NoneType' object has no attribute 'flatten'

additional notes

We noticed this error in different tasks that require us to gather logits and use pipeline parallelism. We managed to reproduce this issue based on the official examples. For simplicity, I base this issue description on these observations.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions