-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
GPU A30 * 2
TensorRT-LLM version: v0.9.0
Model: vicuna 13B
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- build engine
python convert_checkpoint.py --model_dir /data/weilong.yu/vicuna-13b/vicuna-13b-v1.5/ \
--output_dir ./tllm_checkpoint_2gpu_fp16 \
--dtype float16 --tp_size 2
trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_fp16 \
--output_dir ./tmp/llama/13B/trt_engines/fp16/2-gpu \
--gemm_plugin float16 \
--use_fused_mlp \
--max_batch_size $1 \
--max_input_len 2048 \
--max_output_len 256 \
--context_fmha enable \
--paged_kv_cache enable \
--use_paged_context_fmha enable \
--remove_input_padding enable --workers 2 \
--use_fused_mlp
- run benchmark
mpirun -n 2 --allow-run-as-root ./gptManagerBenchmark --engine_dir ../../../examples/llama/tmp/llama/13B/trt_engines/fp16/2-gpu/ --dataset ../../../benchmarks/cpp/token-norm-dist.json --kv_cache_free_gpu_mem_fraction 0.85 --enable_kv_cache_reuse -enable_chunked_context
Expected behavior
No error message.
actual behavior
sh run.sh
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 712 exceeds buffer size 471
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 712 exceeds buffer size 471
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1553 exceeds buffer size 927
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1553 exceeds buffer size 927
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 884 exceeds buffer size 642
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 884 exceeds buffer size 642
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1192 exceeds buffer size 951
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1192 exceeds buffer size 951
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1253 exceeds buffer size 1012
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1253 exceeds buffer size 1012
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][WARNING] Step function failed, continuing.
[BENCHMARK] num_samples 200
[BENCHMARK] total_latency(ms) 71149.43
[BENCHMARK] seq_throughput(seq/sec) 2.81
[BENCHMARK] token_throughput(token/sec) 531.37
[BENCHMARK] avg_sequence_latency(ms) 22587.76
[BENCHMARK] p99_sequence_latency(ms) 50983.86
[BENCHMARK] p90_sequence_latency(ms) 45602.29
[BENCHMARK] p50_sequence_latency(ms) 14514.95
[TensorRT-LLM][INFO] Terminate signal received, worker thread exiting.
[TensorRT-LLM][INFO] Terminate signal received, worker thread exiting.
additional notes
no
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working