-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Description
We frequently build the latest Docker image for this project as part of our workflow, and I’d like to suggest that we publish this image to Docker Hub to support a wider set of contributors.
Running the Docker image locally is straightforward, and users on SLURM-based clusters can easily convert it to a Singularity image. In contrast, building a Singularity image from scratch can be time-consuming and error-prone due to dependency mismatches, build tooling complexity, and GPU driver issues—especially since the Singularity image is not built as frequently as the Docker image in our workflow.
Publishing a prebuilt Docker image would simplify onboarding and usage significantly.
Here is a simple workflow:
- Push Docker Image to Docker Hub (on any dev machine):
docker tag <local_image_name> <dockerhub_username>/<image_repo_name>
docker push <dockerhub_username>/<image_repo_name>
- On an HPC/SLURM Cluster (with
fakeroot
support):
singularity build --fakeroot <image_repo_name>.sif docker://<dockerhub_username>/<image_repo_name>:latest
- Run a single workflow test:
singularity exec --nv --bind $(pwd):/mnt \
--env XLA_PYTHON_CLIENT_ALLOCATOR=platform \
<image_repo_name>.sif \
python -m tests.reference_algorithm_tests \
--workload=imagenet_resnet \
--framework=jax \
--global_batch_size=16 \
--log_file=/tmp/jax_log.pkl \
--submission_path=tests/modeldiffs/vanilla_sgd_jax.py \
--identical=True \
--tuning_search_space=None \
--num_train_steps=10
- Run all train_diff tests:
singularity exec --nv --bind $(pwd):/mnt \
--env XLA_PYTHON_CLIENT_ALLOCATOR=platform \
<image_repo_name>.sif \
python -m tests.test_traindiffs
Metadata
Metadata
Assignees
Labels
No labels