-
Couldn't load subscription status.
- Fork 131
Description
System Info
We've noticed that when there's a mismatch between type of the lora_plugin while building the engine and the type used for the storage-type when calling hf_lora_convert, the LoRa weights are not applied at all and we get the base model response, even by passing in the correct lora_task_id. This happens without any warnings or errors, which makes it hard to know what the issue is.
Example:
trtllm-build \
--checkpoint_dir ${UNIFIED_CKPT_PATH} \
--output_dir ${ENGINE_PATH} \
--lora_plugin bfloat16
and
python3 tensorrt_llm/examples/hf_lora_convert.py -i ${ENGINE_PATH}/lora/0 -o tmp/lora_prefetch/1 --storage-type float16
will always lead to base model response during inference.
However, switching the build lora_plugin to either auto or float16 returns the right response.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- run trt-llm build with
lora_pluginandhf_lora_convertwith different dtypes
Expected behavior
Warning or error if LoRa doesn't work due to this mismatch
actual behavior
fails silently
additional notes
We used the llama3 example