-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't workingstaletriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
System Info
GPU 2* A30, TRT-LLM branch main, commid id: 66ef1df
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
MODEL_CHECKPOINT=/data/models/vicuna-7b-v1.5/
CONVERTED_CHECKPOINT=Llama-7b-hf-ckpt
DTYPE=float16
TP=2
echo "step 1: convert checkpoint"
# Build lora enabled engine
python convert_checkpoint.py --model_dir ${MODEL_CHECKPOINT} \
--output_dir ${CONVERTED_CHECKPOINT} \
--dtype ${DTYPE} \
--tp_size ${TP} \
--pp_size 1
SOURCE_LORA=/data/Llama2-Chinese-7b-Chat-LoRA/
#SOURCE_LORA=/data/llama2-7b-lora.tar.gz
CPP_LORA=chinese-llama-2-lora-7b-cpp
EG_DIR=/tmp/lora-eg
PP=1
MAX_LEN=1024
MAX_BATCH=16
TOKENIZER=/data/models/vicuna-7b-v1.5/
LORA_ENGINE=Llama-2-7b-hf-engine
NUM_LORAS=(8)
NUM_REQUESTS=200
echo "step 2: trtllm-build"
trtllm-build \
--checkpoint_dir ${CONVERTED_CHECKPOINT} \
--output_dir ${LORA_ENGINE} \
--max_batch_size ${MAX_BATCH} \
--max_input_len $MAX_LEN \
--max_output_len $MAX_LEN \
--gpt_attention_plugin float16 \
--paged_kv_cache enable \
--remove_input_padding enable \
--gemm_plugin float16 \
--lora_plugin float16 \
--use_paged_context_fmha enable \
--use_custom_all_reduce disable \
--lora_target_modules attn_qkv attn_dense mlp_h_to_4h mlp_gate mlp_4h_to_h
echo "step 3: Convert LoRA to cpp format"
# Convert LoRA to cpp format
python ../hf_lora_convert.py \
-i $SOURCE_LORA \
--storage-type $DTYPE \
-o $CPP_LORA
echo "step 4: prepare dataset for non-lora requests"
mkdir -p $EG_DIR/data
python ../../benchmarks/cpp/prepare_dataset.py \
--output ${EG_DIR}/data/token-norm-dist.json \
--request-rate -1 \
--time-delay-dist constant \
--tokenizer $TOKENIZER \
token-norm-dist \
--num-requests $NUM_REQUESTS \
--input-mean 256 --input-stdev 16 --output-mean 128 --output-stdev 24
echo "step 5: prepare dataset for lora requests"
for nloras in ${NUM_LORAS[@]}; do
python ../../benchmarks/cpp/prepare_dataset.py \
--output "${EG_DIR}/data/token-norm-dist-lora-${nloras}.json" \
--request-rate -1 \
--time-delay-dist constant \
--rand-task-id 0 $(( $nloras - 1 )) \
--tokenizer $TOKENIZER \
token-norm-dist \
--num-requests $NUM_REQUESTS \
--input-mean 256 --input-stdev 16 --output-mean 128 --output-stdev 24
done
mkdir -p ${EG_DIR}/log-base-lora
NUM_LAYERS=32
NUM_LORA_MODS=8
MAX_LORA_RANK=8
EOS_ID=-1
mpirun -n ${TP} --allow-run-as-root --output-filename ${EG_DIR}/log-base-lora \
../../cpp/build/benchmarks/gptManagerBenchmark \
--engine_dir $LORA_ENGINE \
--type IFB \
--dataset "${EG_DIR}/data/token-norm-dist-lora-8.json" \
--lora_host_cache_bytes 8589934592 \
--lora_num_device_mod_layers $(( 8 * $NUM_LAYERS * $NUM_LORA_MODS * $MAX_LORA_RANK )) \
--kv_cache_free_gpu_mem_fraction 0.80 \
--log_level info \
--eos_id ${EOS_ID}
Expected behavior
Failed to run gptManager benchmark
actual behavior
[TensorRT-LLM][ERROR] Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182)
1 0x5572c6dedde9 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2 0x7f56c6cd5378 /data/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(+0x69c378) [0x7f56c6cd5378]
3 0x7f56c8c3f03f tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 127
4 0x7f56c8c03078 tensorrt_llm::batch_manager::GptManager::fetchNewRequests() + 1464
5 0x7f56c8c0342a tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 170
6 0x7f56c64dd253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f56c64dd253]
7 0x7f56c624cac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f56c624cac3]
8 0x7f56c62de850 /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f56c62de850]
additional notes
none
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaletriagedIssue has been triaged by maintainersIssue has been triaged by maintainers