Skip to content

[Bug]: Deepseek V3 FP4 LWS PD Crash on B200 during KV Transfer #8204

@bryangopal

Description

@bryangopal

System Info

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA B200                    Off |   00000000:03:00.0 Off |                    0 |
| N/A   33C    P0            146W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA B200                    Off |   00000000:13:00.0 Off |                    0 |
| N/A   39C    P0            142W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA B200                    Off |   00000000:63:00.0 Off |                    0 |
| N/A   33C    P0            142W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA B200                    Off |   00000000:73:00.0 Off |                    0 |
| N/A   40C    P0            145W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA B200                    Off |   00000000:83:00.0 Off |                    0 |
| N/A   34C    P0            152W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA B200                    Off |   00000000:93:00.0 Off |                    0 |
| N/A   40C    P0            144W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA B200                    Off |   00000000:E3:00.0 Off |                    0 |
| N/A   34C    P0            142W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA B200                    Off |   00000000:F3:00.0 Off |                    0 |
| N/A   40C    P0            146W / 1000W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0
Python 3.12.3
Name: tensorrt_llm
Version: 1.1.0rc2
Summary: TensorRT-LLM: A TensorRT Toolbox for Large Language Models
Home-page: https://github.com/NVIDIA/TensorRT-LLM
Author: NVIDIA Corporation
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.12/dist-packages
Requires: accelerate, aenum, backoff, blake3, blobfile, build, click, click_option_group, colored, cuda-python, cytoolz, datasets, diffusers, einops, etcd3, evaluate, fastapi, flashinfer-python, h5py, jsonschema, lark, llguidance, matplotlib, meson, mpi4py, mpmath, msgspec, ninja, numpy, nvidia-cuda-nvrtc-cu12, nvidia-ml-py, nvidia-modelopt, nvidia-nccl-cu12, nvtx, omegaconf, onnx, onnx_graphsurgeon, openai, opencv-python-headless, optimum, ordered-set, pandas, peft, pillow, polygraphy, prometheus_client, prometheus_fastapi_instrumentator, protobuf, psutil, pulp, pydantic, pydantic-settings, pynvml, pyzmq, sentencepiece, setuptools, soundfile, StrEnum, tensorrt, tiktoken, torch, torchvision, transformers, triton, uvicorn, wheel, xgrammar
Required-by: 
---
Name: tensorrt
Version: 10.11.0.33
Summary: A high performance deep learning inference library
Home-page: https://github.com/nvidia/tensorrt
Author: NVIDIA Corporation
Author-email: 
License: Proprietary
Location: /usr/local/lib/python3.12/dist-packages
Requires: 
Required-by: tensorrt_llm
---
Name: torch
Version: 2.8.0a0+5228986c39.nv25.6
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /usr/local/lib/python3.12/dist-packages
Requires: filelock, fsspec, jinja2, networkx, setuptools, sympy, typing-extensions
Required-by: accelerate, flash_attn, flashinfer-python, lightning-thunder, nvidia-modelopt, nvidia-resiliency-ext, optimum, peft, tensorrt_llm, torchprofile, torchvision, transformer_engine, xgrammar

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Running intranode 1P1D for Deepseek V3 FP4 with MTP. Once it's warmed up, I begin to run a GPQA benchmark at batch size 1. After some time (over 50 iterations), the server crashes.

Prefill:

cat >./prefill-extra-llm-api-config.yml<<EOF
    enable_iter_perf_stats: true
    print_iter_log: false
    cuda_graph_config:
        max_batch_size: 32
        enable_padding: false
    moe_config:
        backend: TRTLLM
        max_num_tokens: 32768
    speculative_config:
        decoding_type: MTP
        num_nextn_predict_layers: 3
        #use_relaxed_acceptance_for_thinking: true
        # relaxed_topk: 3
        # relaxed_delta: 0.6
    disable_overlap_scheduler: true
    enable_autotuner: false
    kv_cache_config:
        free_gpu_memory_fraction: 0.4
        enable_block_reuse: true
        enable_partial_reuse: false
    enable_chunked_prefill: true
    scheduler_config:
        context_chunking_policy: EQUAL_PROGRESS
    cache_transceiver_config:
        backend: UCX
        max_tokens_in_buffer: 32768
EOF


export TORCHDYNAMO_DISABLE=1

trtllm-serve "${MODEL_NAME}"\
  --host 0.0.0.0 \
  --port "$PORT" \
  --backend pytorch \
  --max_batch_size 32 \
  --max_num_tokens 32768 \
  --max_seq_len 162000 \
  --tp_size 4 --ep_size 1 \
  --extra_llm_api_options ./prefill-extra-llm-api-config.yml \
  --log_level info

Decode:

cat >./decode-extra-llm-api-config.yml<<EOF
    enable_iter_perf_stats: true
    print_iter_log: false
    cuda_graph_config:
        max_batch_size: 32
        enable_padding: false
    moe_config:
        backend: TRTLLM
        max_num_tokens: 32768
    speculative_config:
        decoding_type: MTP
        num_nextn_predict_layers: 3
        #use_relaxed_acceptance_for_thinking: true
        # relaxed_topk: 3
        # relaxed_delta: 0.6
    disable_overlap_scheduler: false
    enable_autotuner: false
    kv_cache_config:
        free_gpu_memory_fraction: 0.5
        enable_block_reuse: true
        enable_partial_reuse: false
    enable_chunked_prefill: true
    cache_transceiver_config:
        backend: UCX
        max_tokens_in_buffer: 32768
EOF

export TORCHDYNAMO_DISABLE=1

trtllm-serve "${MODEL_NAME}"\
  --host 0.0.0.0 \
  --port "$PORT" \
  --backend pytorch \
  --max_batch_size 32 \
  --max_num_tokens 32768 \
  --max_seq_len 162000 \
  --tp_size 4 --ep_size 1 \
  --extra_llm_api_options ./decode-extra-llm-api-config.yml \
  --log_level info

Orchestrator:

cat >./orchestrator-config.yml<<EOF
    hostname: localhost
    port: 8000
    backend: pytorch
    context_servers:
        num_instances: ${PREFILL_COUNT}
        urls:
${PREFILL_URL_LINES}
    generation_servers:
        num_instances: ${DECODE_COUNT}
        urls:
${DECODE_URL_LINES}
EOF

trtllm-serve disaggregated -c orchestrator-config.yml

Expected behavior

not crashing

actual behavior

[TensorRT-LLM][ERROR] Exception in DataResponder response: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :0      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7faa1c51ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fa9f5e735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fa9f5e6c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fa9f5e4db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7faed6829ed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7faed6829ed3]
6       0x7fa9f5e6dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fac4af2bdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fac4af2bdb4]
8       0x7faed6824aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7faed6824aa4]
9       0x7faed68b1c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7faed68b1c3c]
[TensorRT-LLM][ERROR] Exception in DataResponder response: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :1      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f2474d1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f244e6735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f244e66c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f244e64db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f2926ccced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f2926ccced3]
6       0x7f244e66dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f26a3be0db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f26a3be0db4]
8       0x7f2926cc7aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f2926cc7aa4]
9       0x7f2926d54c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f2926d54c3c]
[TensorRT-LLM][ERROR] Exception in DataResponder response: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :3      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f990fa1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f98e93735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f98e936c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f98e934db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f9dc198ced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f9dc198ced3]
6       0x7f98e936dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f9b3e4a8db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f9b3e4a8db4]
8       0x7f9dc1987aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f9dc1987aa4]
9       0x7f9dc1a14c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f9dc1a14c3c]
[TensorRT-LLM][ERROR] Exception in DataResponder response: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :2      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7fe2eb91ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fe2c52735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fe2c526c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fe2c524db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7fe79e03aed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7fe79e03aed3]
6       0x7fe2c526dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fe51a31fdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fe51a31fdb4]
8       0x7fe79e035aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7fe79e035aa4]
9       0x7fe79e0c2c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7fe79e0c2c3c]
[10/08/2025-03:45:31] [TRT-LLM] [RANK 0] [I] statistics={request_id="265", timestamps={received=1759895131.8380768, scheduled=1759895131.8419557, prefill=1759895131.8925006, first_token=0.0, response=1759895131.892659}, counts={prompt_tokens=192, total_tokens=193, completion_tokens=1, cached_tokens=160, prefill={chunks=[{tokens={context=32, decode=0}, requests={context=1, decode=0}, latency=50.32086372375488}], iterations=1}, blocks={context=6, decode=0}, accepted_tokens=0, drafted_tokens=0, acceptance_histogram=[0, 0, 0, 0]}, system={page_size=32, maximum_capacity=32, maximum_blocks=22226, inflight_requests=1, inflight_blocks=6, utilization=0.03125}, preemption={count=0, latency=0.0, tokens=0}, total_time=54.58211898803711, prefill_latency=50.54497718811035, scheduling_latency=3.8788318634033203, ttft=-1759895131838.077, time_per_token=0.1583099365234375, decode_throughput=6316.722891566265, cache_hit_rate=0.8333333333333334, health=false, finish_reason="length"}
[10/08/2025-03:45:31] [TRT-LLM] [RANK 2] [E] Error in event loop: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :2      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7fe2eb91ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fe2c52735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fe2c526c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fe2c524db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7fe79e03aed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7fe79e03aed3]
6       0x7fe2c526dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fe51a31fdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fe51a31fdb4]
8       0x7fe79e035aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7fe79e035aa4]
9       0x7fe79e0c2c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7fe79e0c2c3c]
[10/08/2025-03:45:31] [TRT-LLM] [RANK 1] [E] Error in event loop: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :1      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f2474d1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f244e6735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f244e66c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f244e64db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f2926ccced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f2926ccced3]
6       0x7f244e66dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f26a3be0db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f26a3be0db4]
8       0x7f2926cc7aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f2926cc7aa4]
9       0x7f2926d54c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f2926d54c3c]
[10/08/2025-03:45:31] [TRT-LLM] [RANK 3] [E] Error in event loop: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :3      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f990fa1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f98e93735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f98e936c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f98e934db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f9dc198ced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f9dc198ced3]
6       0x7f98e936dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f9b3e4a8db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f9b3e4a8db4]
8       0x7f9dc1987aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f9dc1987aa4]
9       0x7f9dc1a14c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f9dc1a14c3c]
[10/08/2025-03:45:31] [TRT-LLM] [RANK 2] [E] Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :2      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7fe2eb91ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fe2c52735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fe2c526c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fe2c524db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7fe79e03aed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7fe79e03aed3]
6       0x7fe2c526dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fe51a31fdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fe51a31fdb4]
8       0x7fe79e035aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7fe79e035aa4]
9       0x7fe79e0c2c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7fe79e0c2c3c]

Exception in thread Thread-7 (_event_loop_wrapper):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
[10/08/2025-03:45:31] [TRT-LLM] [RANK 1] [E] Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :1      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f2474d1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f244e6735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f244e66c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f244e64db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f2926ccced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f2926ccced3]
6       0x7f244e66dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f26a3be0db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f26a3be0db4]
8       0x7f2926cc7aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f2926cc7aa4]
9       0x7f2926d54c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f2926d54c3c]

Exception in thread Thread-7 (_event_loop_wrapper):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
[10/08/2025-03:45:31] [TRT-LLM] [RANK 0] [E] Error in event loop: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :0      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7faa1c51ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fa9f5e735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fa9f5e6c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fa9f5e4db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7faed6829ed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7faed6829ed3]
6       0x7fa9f5e6dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fac4af2bdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fac4af2bdb4]
8       0x7faed6824aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7faed6824aa4]
9       0x7faed68b1c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7faed68b1c3c]
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 318, in _event_loop_wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
[10/08/2025-03:45:31] [TRT-LLM] [RANK 3] [E] Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :3      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f990fa1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f98e93735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f98e936c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f98e934db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f9dc198ced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f9dc198ced3]
6       0x7f98e936dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f9b3e4a8db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f9b3e4a8db4]
8       0x7f9dc1987aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f9dc1987aa4]
9       0x7f9dc1a14c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f9dc1a14c3c]

             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
Exception in thread Thread-7 (_event_loop_wrapper):
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 318, in _event_loop_wrapper
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :2      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7fe2eb91ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fe2c52735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fe2c526c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fe2c524db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7fe79e03aed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7fe79e03aed3]
6       0x7fe2c526dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fe51a31fdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fe51a31fdb4]
8       0x7fe79e035aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7fe79e035aa4]
9       0x7fe79e0c2c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7fe79e0c2c3c]
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 318, in _event_loop_wrapper
[10/08/2025-03:45:31] [TRT-LLM] [RANK 0] [E] Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :0      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7faa1c51ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fa9f5e735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fa9f5e6c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fa9f5e4db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7faed6829ed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7faed6829ed3]
6       0x7fa9f5e6dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fac4af2bdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fac4af2bdb4]
8       0x7faed6824aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7faed6824aa4]
9       0x7faed68b1c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7faed68b1c3c]

    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :1      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f2474d1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f244e6735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f244e66c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f244e64db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f2926ccced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f2926ccced3]
6       0x7f244e66dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f26a3be0db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f26a3be0db4]
8       0x7f2926cc7aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f2926cc7aa4]
9       0x7f2926d54c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f2926d54c3c]
Exception in thread Thread-7 (_event_loop_wrapper):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :3      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7f990fa1ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7f98e93735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7f98e936c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7f98e934db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7f9dc198ced3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7f9dc198ced3]
6       0x7f98e936dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7f9b3e4a8db4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7f9b3e4a8db4]
8       0x7f9dc1987aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7f9dc1987aa4]
9       0x7f9dc1a14c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7f9dc1a14c3c]
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 318, in _event_loop_wrapper
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 311, in _event_loop_wrapper
    self.event_loop()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1112, in _executor_loop
    ctx_transmission_reqs = self._send_disagg_ctx_cache(scheduled_batch.context_requests)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1558, in _send_disagg_ctx_cache
    self.kv_cache_transceiver.check_context_transfer_status(0)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py", line 120, in check_context_transfer_status
    return self.impl.check_context_transfer_status(at_least_request_num)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: This executor does not have a prepared KV cache for request ID: 41, and the mReadyResponses size is: 1. mpi rank :0      (/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp:261)
1       0x7faa1c51ff66 tensorrt_llm::common::throwRuntimeError(char const*, int, char const*) + 97
2       0x7fa9f5e735b9 tensorrt_llm::batch_manager::DataResponder::Impl::response() + 3353
3       0x7fa9f5e6c81d std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void> >::_M_invoke(std::_Any_data const&) + 45
4       0x7fa9f5e4db5d std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 45
5       0x7faed6829ed3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0xa1ed3) [0x7faed6829ed3]
6       0x7fa9f5e6dba8 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<void (tensorrt_llm::batch_manager::DataResponder::Impl::*)() noexcept, tensorrt_llm::batch_manager::DataResponder::Impl*> >, void>::_M_run() + 248
7       0x7fac4af2bdb4 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xecdb4) [0x7fac4af2bdb4]
8       0x7faed6824aa4 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7faed6824aa4]
9       0x7faed68b1c3c /usr/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7faed68b1c3c]

additional notes

I tried running after disabling overlap scheduler for decode, still encountered same issue.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    KV-Cache Managementkv-cache management for efficient LLM inferencePytorch<NV>Pytorch backend related issuesbugSomething isn't workingwaiting for feedback

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions