System Info
- CPU EPYC 7H12 (32 core)
- GPU NVIDIA A100-SXM4-80GB
Who can help?
No response
Information
Tasks
Reproduction
- pull triton official image
- clone tensorrtllm_backend
- move your model to repo and make config files
- start the container with docker-compose.yaml file like bellow
services:
tritonserver:
image: triton_trt_llm
network_mode: "host"
container_name: triton
shm_size: '1gb'
volumes:
- /data:/workspace
working_dir: /workspace
restart: always
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: bash -c "python3 ./tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=tensorrtllm_backend/all_models/inflight_batcher_llm/"
Expected behavior
After running the docker compose up command, I expect the container to start the Triton server, wait for it, and remain running unless an error occurs.
actual behavior
The container starts, runs the Python script, and exits immediately without waiting for the Triton server and TensorRT-LLM backend.
additional notes
This bug will be fixed with a simple command after the last line in scripts/launch_triton_server.py like this
child = subprocess.Popen(cmd, env=env)
child.communicate()