Skip to content

[ROCm] Effort to reduce the number of environment variables in command line #17229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hongxiayang
Copy link
Collaborator

@hongxiayang hongxiayang commented Apr 26, 2025

This is to set two environment variables in the Docker file so that users can reduce the number of environment variables when running scripts.

ENV that can improve safe tensor loading, and end-to-end time

ENV SAFETENSORS_FAST_GPU=1

ENV that needed for multi-process on cuda-like platform

ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

Test:

  1. build the docker image
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
  1. run the docker container and check the env are set in the container
docker run --rm vllm-rocm env | grep spawn
VLLM_WORKER_MULTIPROC_METHOD=spawn

docker run --rm vllm-rocm env | grep FAST
SAFETENSORS_FAST_GPU=1

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Apr 26, 2025
@houseroad houseroad added the rocm Related to AMD ROCm label Apr 26, 2025
@houseroad
Copy link
Collaborator

Can we add a test plan?

@@ -114,6 +114,12 @@ COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples
ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1
ENV TOKENIZERS_PARALLELISM=false

# ENV that can improve safe tensor loading, and end-to-end time
ENV SAFETENSORS_FAST_GPU=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV that can improve safe tensor loading,

I didn't find this variable in the vllm repository. Could you remind me why it can improve loading time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the safe tensor, see https://huggingface.co/docs/safetensors/en/speed for more details.

# ENV that can improve safe tensor loading, and end-to-end time
ENV SAFETENSORS_FAST_GPU=1
# ENV that needed for multi-process on cuda-like platform
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?

Copy link
Collaborator Author

@hongxiayang hongxiayang Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comment to elaborate why spawn is needed here? Is it due to some compatibility issue?

There was time this was fixed for ROCm and the fix is to force using spawn when on ROCm platform. See below issue:
#7791

However, during llama4 enablement in March, I found the issue was back when running simple scripts, and we had set the env since then in our scripts.

So to make it safe and stable for all situations, I think it may be a user-friendly thing to put it in the docker file.

Right now, the default is set to "fork" in envs.py,

   # Use dedicated multiprocess context for workers.
   # Both spawn and fork work
   "VLLM_WORKER_MULTIPROC_METHOD":
   lambda: os.getenv("VLLM_WORKER_MULTIPROC_METHOD", "fork"),

But, if we search the code, there are many places to force setting "VLLM_WORKER_MULTIPROC_METHOD=spawn".
for example:

    if reason is not None:
        logger.warning(
            "We must use the `spawn` multiprocessing start method. "
            "Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. "
            "See https://docs.vllm.ai/en/latest/getting_started/"
            "troubleshooting.html#python-multiprocessing "
            "for more information. Reason: %s", reason)
        os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"

@hongxiayang
Copy link
Collaborator Author

Can we add a test plan?

Added the test plan in the description of the pull request

Signed-off-by: Hongxia Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build rocm Related to AMD ROCm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants