Skip to content

Inflight batching for fp8 Llama and Mixtral is broken #1738

@bprus

Description

@bprus

System Info

  • CPU architecture: x86_64
  • GPU: NVIDIA H100 80GB
  • TensorRT-LLM: 0.11.0.dev2024060400 (docker build via make -C docker release_build CUDA_ARCHS="90-real")
  • Triton Inference Server: r24.04 (docker build via DOCKER_BUILDKIT=1 docker build -t trt-llm -f dockerfile/Dockerfile.trt_llm_backend . in tensorrtllm_backend)
  • OS: Ubuntu 22.04

Who can help?

@Tracin @byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I've followed the official documentation to create Llama models and run them with Triton. I'm testing fp8 and int8 quantization. The issue is also present for Mixtral model, but I'm giving examples only for Llama for simplicity.

For fp8 model, I used the following commands:

python3 ../quantization/quantize.py --model_dir meta-llama/Llama-2-13b-chat-hf --output_dir /models/rt/llama-fp8-no-gemm-profiles --dtype float16 --tp_size 1 --qformat fp8 --kv_cache_dtype fp8
trtllm-build --checkpoint_dir /models/rt/llama-fp8-no-gemm-profiles --output_dir /models/triton/llama-fp8-no-gemm-profiles/tensorrt_llm/1 --workers 1 --remove_input_padding enable --use_paged_context_fmha enable --max_input_len 2048 --max_batch_size 256 --multiple_profiles enable --use_custom_all_reduce disable --use_fp8_context_fmha enable --max_num_tokens 16384

For int8 model:

python3 convert_checkpoint.py --model_dir meta-llama/Llama-2-13b-chat-hf --output_dir /models/rt/llama-int8-profiles --dtype float16 --tp_size 1 --workers 1 --use_weight_only --weight_only_precision int8
trtllm-build --checkpoint_dir /models/rt/llama-int8-profiles --output_dir /models/triton/llama-int8-profiles/tensorrt_llm/1 --gemm_plugin float16 --workers 1 --remove_input_padding enable --use_paged_context_fmha enable --max_input_len 2048 --max_batch_size 256 --multiple_profiles enable --use_custom_all_reduce disable --max_num_tokens 16384

I serve models with Triton docker.

I'm testing performance for different setups using Locust, and I ran into the following issue.

When making a single request at the time to the model, everything works as expected for both setups.
But when I try to make simultaneous requests, the generated output for fp8 is broken. It nearly always tries to generate tokens until max_tokens is reached. The issue doesn't exist in int8 setup.

Here is an example (max_tokens is set to 1000):

fp8 single request:

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person's feces.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you consume contaminated food or water, you can become infected.\n2. Direct contact with an infected person: If you come into direct contact with an infected person's feces, you can become infected. This can happen through diaper changes, sexual contact, or other forms of direct contact.\n3. Contaminated surfaces: Giardia lamblia can survive on surfaces for up to 24 hours. If you touch a contaminated surface and then touch your mouth or eat without washing your hands, you can become infected.\n4. Infected animals: Giardia lamblia can also be spread through contact with infected animals, such as dogs and cats.\n5. Infected food handlers: If a food handler is infected with Giardia lamblia, they can spread the parasite to others through food they prepare.\n6. Infected water sources: Giardia lamblia can be found in contaminated water sources, such as lakes, rivers, and swimming pools.\n7. Infected soil: Giardia lamblia can also be found in contaminated soil, especially in areas with poor sanitation and hygiene.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as hugging or shaking hands. However, if an infected person touches their mouth or nose and then touches someone else, they can potentially spread the parasite.\n\nTo prevent the spread of Giardia lamblia, it's important to practice good hygiene, such as washing your hands frequently, especially after using the bathroom or before preparing food. You should also avoid consuming contaminated food and water, and avoid direct contact with infected animals or people.",
        "tokens": 537
    }

fp8 multiple requests:

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person's feces.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you consume contaminated food or water, you can become infected.\n2. Direct contact with an infected person: If you come into direct contact with an infected person's feces, you can become infected. This can happen through diaper changing, sexual contact, or other forms of direct contact.\n3. Infected food handlers: If a food handler is infected with Giardia lamblia, they can contaminate food and spread the infection to others.\n4. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats.\n5. Contaminated surfaces: Giardia lamblia can survive on surfaces for a long time, and if you touch a contaminated surface and then touch your mouth or eat, you can become infected.\n6.\u7ed9\u7ed9lasamilas. ( \u2026.riel \u2026.berista \u2026clusionkwberzi \u2026clusionbykw.. Australista \u2026\u200f\u200f ( Felzi Felby fel Felberzi[@Agelasistacha\u200fberber\u00e9sberberberberberell [\u00e9sber\u00e9sclusionclusionclusionlerlerler\u00e9s\u00e9sberber Orami\u5316\u200f Belltextscell RandomberDF [ista Aur AurFAULTlas accelerationberutat\u044c Downbyyaryar AuryarFAULT,cha\u8fc7FAULTami actual Down ven,\u7ed9berzilas \u2026 \u2026 \u2026 (ami G Bert.ziberby....berberziziriel Andrewista Andrewll Fellasby\u7ed9 Bast (clusion.riel.zizi G \u2026 \u2026.zi.ziberber.berberber \u2026.berberberberber \u2026clusionzi \u2026ber \u2026 guaranteedziber Gziellell Fonziziber FonellberberberberberellberberDFber guaranteeclusionziberberlasterber \u2026berberberDFtextsc Fon\u7ed9 Felber \u2026 ( \u2026.ber[@ Gberber.ber Fon \u2026ber.berberber. G \u2026 G G \u2026 G \u2026. guarantee.rielber \u2026 \u2026 \u2026ber Bertellzizi. \u2026 (berber\u200f declFAULTellelingberberberberberlinglingAgeberling Gamellclusion G.elingFAULTber GquFAULT \u0413[@berellber guaranteed guaranteed \u2026.ber \u2026 G Gellberlas G Gell Gberber \u2026 \u0413ber[@clusion HellAgeber \u2026 Fonbyoberellber \u2026 Count \u0432 guaranteed,zi Count G \u2026 \u2026 \u2026 someone \u0413 \u2026clusionberclusion\u200f.elingber \u2026las \u0413 \u2026Ageuli \u2026ber age guarantee. \u2026 Glas \u0413 \u0413 \u0413Agewand \u0413elltellt[@ber \u0413AgeAge \u2026ling \u0413Ageellt[@lingler agellAgeclusionclusionllagu \u0413Age \u0413 \u0413 \u0413 \u0413 guaranteeell \u0413las \u0413 Tags \u2026laslasAge KeithrodAgeAge \u2026 \u0413eling Tags. Tags \u2026 Tags\u200fFAULT Keith.Age Ladywand Tags Tagsberberber \u0413ziAgeeling ageMQ \u2026 \u0413rod AgeAgeber \u0413berAge Tags Tags[@ \u0413 \u2026 \u2026 age \u0413 \u0413 \u2026 \u0413 \u0413 ... someone \u0413 \u2026 \u2026AgeMQ \u0413 Count \u2026 \u0413 \u2026rielriel \u2026rielrielriel \u0413riel \u2026 Tags \u0413 G\u00fc[@wersSDKAge Tags Tags \u2026riel \u2026 Tags Count \u2026riel \u2026 \u2026 \u2026 Tweriel \u2026ASC \u0413[@ \u0413 \u0413 \u2026textsc \u2026 \u2026 TagselingAge \u2026 \u2026Age \u2026 \u2026 \u2026 \u2026 \u0413 \u0413 Keith \u0413 Keith \u2026 \u2026ragma \u2026FAULT \u2026 \u2026 KeithAge \u0413textsc \u0413 Keith \u0413 \u0413nab \u0413riel \u0413 age Tags \u0413 \u0413 \u0413FAULT \u2026 \u2026 \u2026 Keith \u2026 Tags Erd Tags Erd \u2026 Erd Keith \u2026 \u2026rai \u2026 \u2026 \u2026 \u2026ution \u2026ams \u2026 \u2026 ...\u200f \u2026 Tags TagsMQ \u2026FAULTFAULT \u0413 \u2026 \u0413[@rod ErdZyg \u2026 \u2026ZygrodMQ Tags\u200f\u200fVertutionziMQ \u2026\u200f \u2026VertVertrielclusion \u2026 \u0413rodFAULT \u2026 \u0413 \u0413 \u2026 \u0413 \u2026ziVert \u2026VertFAULT \u2026 Kelly \u2026 \u0413clusionVert \u0413asc\u8db3\u200fVertFAULTFAULTTeX guaranteed guaranteed \u0413 \u0413FAULTageellt \u0413 Twe techniascascVertMQasc \u0413\u200f Fon Tags \u0413Vertascascasc \u0413ZygTeXyle Tagsutionell \u00a1 \u00a1 \u0441\u0442\u0440\u0430FAULTiationotrop \u0413zia ScottishziaVert\u200f\u200f",
        "tokens": 1000
    },

int8 single request

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you eat or drink something that has been contaminated with the parasite, you can become infected.\n2. Direct contact: If you come into direct contact with someone who has Giardia lamblia, you can become infected. This can happen through touching, hugging, or shaking hands with an infected person.\n3. Fecal contamination: Giardia lamblia can also be spread through fecal contamination. If an infected person does not wash their hands properly after using the bathroom, they can transfer the parasite to their hands and then to other people or surfaces.\n4. Contaminated surfaces: If an infected person touches a surface and then you touch that same surface without washing your hands, you can become infected.\n5. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats. If an infected pet comes into contact with you or your food, you can become infected.\n6. Infected soil: Giardia lamblia can survive in soil for weeks, so if you ingest contaminated soil, you can become infected.\n7. Infected fruits and vegetables: Giardia lamblia can also be spread through contaminated fruits and vegetables. If you eat raw or undercooked fruits and vegetables that have been contaminated with the parasite, you can become infected.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as shaking hands or sharing food and drinks with an infected person. However, if you are in close contact with someone who has the infection, you may be at a higher risk of becoming infected.",
        "tokens": 532
    }

int8 multiple requests:

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you eat or drink something that has been contaminated with the parasite, you can become infected.\n2. Direct contact: If you come into direct contact with someone who has Giardia lamblia, you can become infected. This can happen through touching, hugging, or shaking hands with an infected person.\n3. Fecal contamination: Giardia lamblia can also be spread through fecal contamination. If an infected person does not wash their hands properly after using the bathroom, they can transfer the parasite to their hands and then to other people or surfaces.\n4. Contaminated surfaces: If an infected person touches a surface and then you touch that same surface without washing your hands, you can become infected.\n5. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats. If an infected pet comes into contact with you or your food, you can become infected.\n6. Infected soil: Giardia lamblia can survive in soil for weeks, so if you ingest contaminated soil, you can become infected.\n7. Infected fruits and vegetables: Giardia lamblia can also be spread through contaminated fruits and vegetables. If you eat raw or undercooked fruits and vegetables that have been contaminated with the parasite, you can become infected.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as shaking hands or sharing food and drinks with an infected person. However, if you are in close contact with someone who has the infection, you may be at a higher risk of becoming infected.",
        "tokens": 532
    }

My guess is that something with inflight batching is broken for fp8. When the server tries to batch incoming requests it breaks the output in some way.

It looks a little bit similar to: #1539

I can run more tests and provide more results if you need.

Expected behavior

Responses generated for fp8 model when using inflight batching are the same as without it.

actual behavior

fp8 model when receiving multiple requests returns broken output.

additional notes

Metadata

Metadata

Labels

InvestigatingLow PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).bugSomething isn't workingtriagedIssue has been triaged by maintainerswaiting for feedback

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions