-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Description
🚀 The feature, motivation and pitch
I'm currently exploring the integration of the upstream vLLM Helm chart with llm-d, a Kubernetes-native distributed inferencing stack. llm-d utilizes a sidecar container as a routing proxy for prefill/decode scenarios, which forwards requests to prefill pods. This proxy is deployed as an init container on decode instances to ensure it is available before the main server starts.
However, the current upstream vLLM Helm chart has a limitation: when .extraInit
is specified, the init container is hardcoded to perform model downloads. This restricts our ability to customize the init container behavior for use cases like llm-d. To enable benchmarking llm-d, so we need more flexible init container configuration.
I have two potential approaches to address this. Depending on community feedback, I'm happy to open a PR for the preferred solution.
Alternatives
Solution 1: breaking change for existing users but cleaner, a refactor that introduces a more extensible init container specification.
Move model download specs into .extraInit.downloadModel
. If .extraInit.downloadModel.enable == true
, the wait-download-model
container is the first initContainer. The containers inside .extraInit.custom
are appended. By default, values.yaml will have downloadModel.enable
set to True. The user's values.yaml specs may look like this for including model download container:
extraInit:
# If any of the fields is non-empty, create the model download container first in the list
downloadModel:
enable: true
s3modelpath: "relative_s3_model_path/opt-125m"
pvcStorage: "1Gi"
awsEc2MetadataDisabled: true
# Add custom init containers
custom:
- name: llm-d-routing-proxy
image: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
args: []
command: []
- name: another-init
# ...
and without download container:
extraInit:
# If any of the fields is non-empty, create the model download container first in the list
downloadModel:
enable: false
# Add custom init containers
custom:
- name: llm-d-routing-proxy
image: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
args: []
command: []
- name: another-init
# ...
Solution 2: Non-breaking (but not programmatically elegant). A workaround that maintains backward compatibility but may not be ideal in terms of chart design.
Adds a new field to values.yaml, namely .Values.extraCustomInit
, where the user can specify their inits. This will append new initContainers to the wait-model-download container if .Values.extraInit
fields are non-empty. This approach does not break existing users's deployments but adds a cognitive overload to the values interface. By default, the values.yaml will have extraCustomInit: []
.
For including model download container:
extraInit:
s3modelpath: "relative_s3_model_path/opt-125m"
pvcStorage: "1Gi"
awsEc2MetadataDisabled: true
# Add custom init containers
extraCustomInit:
- name: llm-d-routing-proxy
image: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
args: []
command: []
- name: another-init
# ...
and excluding model download:
extraInit: {}
# Add custom init containers
extraCustomInit:
- name: llm-d-routing-proxy
image: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
args: []
command: []
- name: another-init
# ...
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.