This guide provides an example of how to deploy and serve a Stable Diffusion model on Google Kubernetes Engine (GKE) using Ray Serve and the Ray Operator add-on as an example implementation.
About Ray and Ray Serve
Ray is an open-source scalable compute framework for AI/ML applications. Ray Serve is a model serving library for Ray used for scaling and serving models in a distributed environment. For more information, see Ray Serve in the Ray documentation.
You can use a RayCluster or RayService resource to deploy your Ray Serve applications. You should use a RayService resource in production for the following reasons:
- In-place updates for RayService applications
- Zero downtime upgrading for RayCluster resources
- Highly available Ray Serve applications
Prepare your environment
To prepare up your environment, follow these steps:
Launch a Cloud Shell session from the Google Cloud console, by clicking
Activate Cloud Shell in the Google Cloud console. This launches a session in the bottom pane of the Google Cloud console.
Set environment variables:
export PROJECT_ID=PROJECT_ID export CLUSTER_NAME=rayserve-cluster export COMPUTE_REGION=us-central1 export COMPUTE_ZONE=us-central1-c export CLUSTER_VERSION=CLUSTER_VERSION export TUTORIAL_HOME=`pwd`
Replace the following:
PROJECT_ID
: your Google Cloud project ID.CLUSTER_VERSION
: the GKE version to use. Must be1.30.1
or later.
Clone the GitHub repository:
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
Change to the working directory:
cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion
Create a Python virtual environment:
venv
python -m venv myenv && \ source myenv/bin/activate
Conda
Run the following commands:
conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
When you deploy a Serve application with
serve run
, Ray expects the Python version of the local client to match the version used in the Ray cluster. Therayproject/ray:2.37.0
image uses Python 3.9. If you're running a different client version, select the appropriate Ray image.Install the required dependencies to run the Serve application:
pip install ray[serve]==2.37.0 pip install torch pip install requests
Create a cluster with a GPU node pool
Create an Autopilot or Standard GKE cluster with a GPU node pool:
Autopilot
Create an Autopilot cluster:
gcloud container clusters create-auto ${CLUSTER_NAME} \
--enable-ray-operator \
--cluster-version=${CLUSTER_VERSION} \
--location=${COMPUTE_REGION}
Standard
Create a Standard cluster:
gcloud container clusters create ${CLUSTER_NAME} \ --addons=RayOperator \ --cluster-version=${CLUSTER_VERSION} \ --machine-type=c3d-standard-8 \ --location=${COMPUTE_ZONE} \ --num-nodes=1
Create a GPU node pool:
gcloud container node-pools create gpu-pool \ --cluster=${CLUSTER_NAME} \ --machine-type=g2-standard-8 \ --location=${COMPUTE_ZONE} \ --num-nodes=1 \ --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest
Deploy a RayCluster resource
To deploy a RayCluster resource:
Review the following manifest:
This manifest describes a RayCluster resource.
Apply the manifest to your cluster:
kubectl apply -f ray-cluster.yaml
Verify the RayCluster resource is ready:
kubectl get raycluster
The output is similar to the following:
NAME DESIRED WORKERS AVAILABLE WORKERS CPUS MEMORY GPUS STATUS AGE stable-diffusion-cluster 2 2 6 20Gi 0 ready 33s
In this output,
ready
in theSTATUS
column indicates the RayCluster resource is ready.
Connect to the RayCluster resource
To connect to the RayCluster resource:
Verify that GKE created the RayCluster service:
kubectl get svc stable-diffusion-cluster-head-svc
The output is similar to the following:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE pytorch-mnist-cluster-head-svc ClusterIP 34.118.238.247 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP 109s
Establish port-forwarding sessions to the Ray head:
kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null & kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &
Verify that the Ray client can connect to the Ray cluster using localhost:
ray list nodes --address http://localhost:8265
The output is similar to the following:
======== List: 2024-06-19 15:15:15.707336 ======== Stats: ------------------------------ Total: 3 Table: ------------------------------ NODE_ID NODE_IP IS_HEAD_NODE STATE NODE_NAME RESOURCES_TOTAL LABELS 0 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2 10.28.1.21 False ALIVE 10.28.1.21 CPU: 2.0 ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2 # Several lines of output omitted
Run a Ray Serve application
To run a Ray Serve application:
Run the Stable Diffusion Ray Serve application:
serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001
The output is similar to the following:
2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'. 2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'. 2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default'). 2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default'). 2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default'). 2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default'). 2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default'). 2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default'). 2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.
Establish a port-forwarding session to the Ray Serve port (8000):
kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &
Run the Python script:
python generate_image.py
The script generates an image to a file named
output.png
. The image is similar to the following:
Deploy a RayService
The RayService custom resource manages the lifecycle of a RayCluster resource and Ray Serve application.
For more information about RayService, see Deploy Ray Serve Applications and Production Guide in the Ray documentation.
To deploy a RayService resource, follow these steps:
Review the following manifest:
This manifest describes a RayService custom resource.
Apply the manifest to your cluster:
kubectl apply -f ray-service.yaml
Verify that the Service is ready:
kubectl get svc stable-diffusion-serve-svc
The output is similar to the following:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE stable-diffusion-serve-svc ClusterIP 34.118.236.0 <none> 8000/TCP 31m
Configure port-forwarding to the Ray Serve Service:
kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &
Run the Python script from the previous section:
python generate_image.py
The script generates an image similar to the image generated in the previous section.