[Bug]: Deadly Request: Model Crashes After Specific Chat Completion Call

### System Info

gpu: A100

docker image:
nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc4

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I served the model as follows:
```
trtllm-serve /app/data/gemma-3-4b-it/model --max_batch_size 128 --host 0.0.0.0 \
--kv_cache_free_gpu_memory_fraction 0.3
```
Then, I sent the following request:
```
messages = [{"role": "user", "content": "سلام"}]

data = {
    "model": "model",
    "messages": messages,
    "temperature": 0.1,
    "top_k": 1,
    "max_tokens": 0,
    "stream": False
}

url = 'http://127.0.0.1:8000/v1/chat/completions'
response = requests.post(url, json=data, headers={"Content-Type": "application/json"})
```
The first time, I got an error saying that max_tokens must not be zero.
The second time I sent the exact same request:
```
response = requests.post(url, json=data, headers={"Content-Type": "application/json"})
```
this time it completely brought the model down.

Here are the logs:
```
[2025-09-11 17:02:59] ERROR base_events.py:1821: Task exception was never retrieved
future: <Task finished name='Task-10' coro=<OpenAIServer.await_disconnected() done, defined at /usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py:161> exception=AttributeError("'NoneType' object has no attribute 'send'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 167, in await_disconnected
    promise.abort()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/result.py", line 532, in abort
    self._executor().abort_request(self.request_id)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/proxy.py", line 157, in abort_request
    self.request_queue.put(CancellingRequest(request_id))
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/ipc.py", line 123, in put
    self.socket.send(signed_data)
    ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'
[09/11/2025-17:03:07] [TRT-LLM] [E] Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_server.py", line 467, in openai_chat
    promise = self.llm.generate_async(
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 351, in generate_async
    raise RuntimeError("LLM is shutting down")
RuntimeError: LLM is shutting down
```

### Expected behavior

.

### actual behavior

.

### additional notes

.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Deadly Request: Model Crashes After Specific Chat Completion Call #7692

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Deadly Request: Model Crashes After Specific Chat Completion Call #7692

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions