-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[ROCm] Effort to reduce the number of environment variables in command line #17229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Hongxia Yang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Can we add a test plan? |
@@ -114,6 +114,12 @@ COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples | |||
ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1 | |||
ENV TOKENIZERS_PARALLELISM=false | |||
|
|||
# ENV that can improve safe tensor loading, and end-to-end time | |||
ENV SAFETENSORS_FAST_GPU=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENV that can improve safe tensor loading,
I didn't find this variable in the vllm repository. Could you remind me why it can improve loading time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the safe tensor, see https://huggingface.co/docs/safetensors/en/speed for more details.
docker/Dockerfile.rocm
Outdated
# ENV that can improve safe tensor loading, and end-to-end time | ||
ENV SAFETENSORS_FAST_GPU=1 | ||
# ENV that needed for multi-process on cuda-like platform | ||
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?
There was time this was fixed for ROCm and the fix is to force using spawn when on ROCm platform. See below issue:
#7791
However, during llama4 enablement in March, I found the issue was back when running simple scripts, and we had set the env since then in our scripts.
So to make it safe and stable for all situations, I think it may be a user-friendly thing to put it in the docker file.
Right now, the default is set to "fork" in envs.py,
# Use dedicated multiprocess context for workers.
# Both spawn and fork work
"VLLM_WORKER_MULTIPROC_METHOD":
lambda: os.getenv("VLLM_WORKER_MULTIPROC_METHOD", "fork"),
But, if we search the code, there are many places to force setting "VLLM_WORKER_MULTIPROC_METHOD=spawn".
for example:
if reason is not None:
logger.warning(
"We must use the `spawn` multiprocessing start method. "
"Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. "
"See https://docs.vllm.ai/en/latest/getting_started/"
"troubleshooting.html#python-multiprocessing "
"for more information. Reason: %s", reason)
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
Added the test plan in the description of the pull request |
Signed-off-by: Hongxia Yang <[email protected]>
Signed-off-by: Hongxia Yang <[email protected]>
This is to set two environment variables in the Docker file so that users can reduce the number of environment variables when running scripts.
ENV that can improve safe tensor loading, and end-to-end time
ENV that needed for multi-process on cuda-like platform
Test: