Batch is a fully managed service that lets you schedule, queue, and execute batch processing workloads on Compute Engine virtual machine (VM) instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale.
Workflows allows you to execute the services you need in an order that you define described using the Workflows syntax.
In this tutorial, you use the Workflows connector for Batch to schedule and run a Batch job that executes six tasks in parallel on two Compute Engine VMs. Using both Batch and Workflows allows you to combine the advantages they offer and efficiently provision and orchestrate the entire process.
Create an Artifact Registry repository
Create a repository to store your Docker container image.
Console
In the Google Cloud console, go to the Repositories page.
Click
Create Repository.Enter containers as the repository name.
For Format, choose Docker.
For Location Type, choose Region.
In the Region list, select us-central1.
Click Create.
gcloud
Run the following command:
gcloud artifacts repositories create containers \
--repository-format=docker \
--location=us-central1
You have created an Artifact Registry repository named containers
in the
us-central1
region. For more information about supported regions, see
Artifact Registry locations.
Get the code samples
Google Cloud stores the application source code for this tutorial in GitHub. You can clone that repository or download the samples.
Clone the sample app repository to your local machine:
git clone https://github.com/GoogleCloudPlatform/batch-samples.git
Alternatively, you can download the samples in the
main.zip
file and extract it.Change to the directory that contains the sample code:
cd batch-samples/primegen
You now have the source code for the application in your development environment.
Build the Docker image using Cloud Build
The Dockerfile
contains the information needed to build a Docker image
using Cloud Build. Run the following command to build it:
gcloud builds submit \
-t us-central1-docker.pkg.dev/PROJECT_ID/containers/primegen-service:v1 PrimeGenService/
Replace PROJECT_ID
with your Google Cloud
project ID.
When the build is complete, you should see output similar to the following:
DONE
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID: a54818cc-5d14-467b-bfda-5fc9590af68c
CREATE_TIME: 2022-07-29T01:48:50+00:00
DURATION: 48S
SOURCE: gs://project-name_cloudbuild/source/1659059329.705219-17aee3a424a94679937a7200fab15bcf.tgz
IMAGES: us-central1-docker.pkg.dev/project-name/containers/primegen-service:v1
STATUS: SUCCESS
Using a Dockerfile, you've built a Docker image named primegen-service
and
pushed the image to an Artifact Registry repository named containers
.
Deploy a workflow that schedules and runs a Batch job
The following workflow schedules and runs a Batch job that runs a Docker container as six tasks in parallel on two Compute Engine VMs. The result is the generation of six batches of prime numbers, stored in a Cloud Storage bucket.
Console
In the Google Cloud console, go to the Workflows page.
Click
Create.Enter a name for the new workflow, such as
batch-workflow
.In the Region list, select us-central1.
Select the Service account you previously created.
Click Next.
In the workflow editor, enter the following definition for your workflow:
YAML
JSON
Click Deploy.
gcloud
Create a source code file for your workflow:
touch batch-workflow.JSON_OR_YAML
Replace
JSON_OR_YAML
withyaml
orjson
depending on the format of your workflow.In a text editor, copy the following workflow to your source code file:
YAML
JSON
Deploy the workflow by entering the following command:
gcloud workflows deploy batch-workflow \ --source=batch-workflow.yaml \ --location=us-central1 \ --service-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
Replace
SERVICE_ACCOUNT_NAME
with the name of the service account you previously created.
Execute the workflow
Executing a workflow runs the current workflow definition associated with the workflow.
Console
In the Google Cloud console, go to the Workflows page.
On the Workflows page, click the batch-workflow workflow to go to its details page.
On the Workflow details page, click play_arrow Execute.
Click Execute again.
The workflow execution should take a few minutes.
View the results of the workflow in the Output pane.
The results should look similar to the following:
{ "bucket": "project-name-job-primegen-TIMESTAMP", "jobId": "job-primegen-TIMESTAMP" }
gcloud
Execute the workflow:
gcloud workflows run batch-workflow \ --location=us-central1
The workflow execution should take a few minutes.
To get the status of the last completed execution, run the following command:
gcloud workflows executions describe-last
The results should be similar to the following:
name: projects/PROJECT_NUMBER/locations/us-central1/workflows/batch-workflow/executions/EXECUTION_ID result: '{"bucket":"project-name-job-primegen-TIMESTAMP","jobId":"job-primegen-TIMESTAMP"}' startTime: '2022-07-29T16:08:39.725306421Z' state: SUCCEEDED status: currentSteps: - routine: main step: returnResult workflowRevisionId: 000001-9ba
List the objects in the output bucket
You can confirm that the results are as expected by listing the objects in your Cloud Storage output bucket.
Console
- In the Google Cloud console, go to the Cloud Storage Buckets page.
In the bucket list, click on the name of the bucket whose contents you want to view.
The results should be similar to the following, with six files in total, and each listing a batch of 10,000 prime numbers:
primes-1-10000.txt primes-10001-20000.txt primes-20001-30000.txt primes-30001-40000.txt primes-40001-50000.txt primes-50001-60000.txt
gcloud
Retrieve your output bucket name:
gcloud storage ls
The output is similar to the following:
gs://PROJECT_ID-job-primegen-TIMESTAMP/
List the objects in your output bucket:
gcloud storage ls gs://PROJECT_ID-job-primegen-TIMESTAMP/** --recursive
Replace
TIMESTAMP
with the timestamp returned by the previous command.The output should be similar to the following, with six files in total, and each listing a batch of 10,000 prime numbers:
gs://project-name-job-primegen-TIMESTAMP/primes-1-10000.txt gs://project-name-job-primegen-TIMESTAMP/primes-10001-20000.txt gs://project-name-job-primegen-TIMESTAMP/primes-20001-30000.txt gs://project-name-job-primegen-TIMESTAMP/primes-30001-40000.txt gs://project-name-job-primegen-TIMESTAMP/primes-40001-50000.txt gs://project-name-job-primegen-TIMESTAMP/primes-50001-60000.txt