-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
Inference runtime<NV>General operational aspects of TRTLLM execution not in other categories.<NV>General operational aspects of TRTLLM execution not in other categories.bugSomething isn't workingSomething isn't working
Description
System Info
gpu: A100
docker image:
nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc4
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I served the model as follows:
trtllm-serve /app/data/gemma-3-4b-it/model --max_batch_size 128 --host 0.0.0.0 \
--kv_cache_free_gpu_memory_fraction 0.3
Then, I sent the following request:
messages = [{"role": "user", "content": "سلام"}]
data = {
"model": "model",
"messages": messages,
"temperature": 0.1,
"top_k": 1,
"max_tokens": 0,
"stream": False
}
url = 'http://127.0.0.1:8000/v1/chat/completions'
response = requests.post(url, json=data, headers={"Content-Type": "application/json"})
The first time, I got an error saying that max_tokens must not be zero.
The second time I sent the exact same request:
response = requests.post(url, json=data, headers={"Content-Type": "application/json"})
this time it completely brought the model down.
Here are the logs:
[2025-09-11 17:02:59] ERROR base_events.py:1821: Task exception was never retrieved
future: <Task finished name='Task-10' coro=<OpenAIServer.await_disconnected() done, defined at /usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py:161> exception=AttributeError("'NoneType' object has no attribute 'send'")>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 167, in await_disconnected
promise.abort()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 532, in abort
self._executor().abort_request(self.request_id)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/proxy.py", line 157, in abort_request
self.request_queue.put(CancellingRequest(request_id))
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/ipc.py", line 123, in put
self.socket.send(signed_data)
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'
[09/11/2025-17:03:07] [TRT-LLM] [E] Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 467, in openai_chat
promise = self.llm.generate_async(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 351, in generate_async
raise RuntimeError("LLM is shutting down")
RuntimeError: LLM is shutting down
Expected behavior
.
actual behavior
.
additional notes
.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
Inference runtime<NV>General operational aspects of TRTLLM execution not in other categories.<NV>General operational aspects of TRTLLM execution not in other categories.bugSomething isn't workingSomething isn't working