diff --git a/README.md b/README.md index 22b85b2..7306ced 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,11 @@ _Additional models and pipelines for 🤗 Diffusers created by [Lambda Labs](htt - [Stable Diffusion Image Variations](#stable-diffusion-image-variations) - [Pokemon text to image](#pokemon-text-to-image) + +

+🦄 Other exciting ML projects at Lambda: ML Times, Distributed Training Guide, Text2Video, GPU Benchmark. +

+ ## Installation ```bash @@ -125,10 +130,22 @@ cd lambda-diffusers/scripts make bench ``` +Currently `xformers` does not support H100. The "without xformers" results below are generated by running the benchmark with `--xformers no` (can be set in `scripts/Makefile`) + ### Results +With [xformers](https://github.com/facebookresearch/xformers), raw data can be found [here](./benchmarks/benchmark.csv). ![](./docs/pictures/sd_throughput.png) +Without [xformers](https://github.com/facebookresearch/xformers), raw data can be found [here](./benchmarks/benchmark_no_xformers.csv). +![](./docs/pictures/sd_throughput_noxformer.png) + +H100 MIG performance, raw data can be found [here](./benchmarks/benchmark_H100_MIG.csv). +![](./docs/pictures/sd_throughput_mig.png) + +Cost analysis +![](./docs/pictures/cost_analysis.png) + ## Links - [Captioned Pokémon dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) diff --git a/benchmark.csv b/benchmarks/benchmark.csv similarity index 100% rename from benchmark.csv rename to benchmarks/benchmark.csv diff --git a/benchmarks/benchmark_H100_MIG.csv b/benchmarks/benchmark_H100_MIG.csv new file mode 100644 index 0000000..87c70dd --- /dev/null +++ b/benchmarks/benchmark_H100_MIG.csv @@ -0,0 +1,65 @@ +device,precision,autocast,xformers,runtime,n_samples,latency,memory, +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,1,1.73,7.7 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,1,1.06,3.46 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,2,2.66,9.79 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,2,1.73,4.57 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,4,4.47,18.49 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,4,2.63,8.91 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,8,8.16,23.86 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,8,4.97,12.57 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,16,15.98,42.38 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,16,9.61,29.01 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,32,32.04,80.51 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,32,19.07,55.57 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,1,2.3,7.74 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,1,1.52,3.45 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,2,3.95,9.48 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,2,2.42,4.57 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,4,7.12,18.2 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,4,4.17,8.9 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,8,13.91,23.75 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,8,7.91,12.49 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,16,15.73,29.01 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe MIG 4g.40gb,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,1,4.2,7.76 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,1,2.58,3.41 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,2,7.61,11.09 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,2,4.56,4.59 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,4,14.45,17.65 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,4,8.24,6.78 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,8,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,8,15.81,15.65 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe MIG 2g.20gb,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,1,9.17,7.76 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,1,5.39,3.47 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,2,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,2,9.29,4.63 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,4,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,4,17.4,6.8 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,8,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,8,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe MIG 1g.10gb,half,FALSE,FALSE,pytorch,128,-1,-1 \ No newline at end of file diff --git a/benchmarks/benchmark_no_xformers.csv b/benchmarks/benchmark_no_xformers.csv new file mode 100644 index 0000000..d578b6d --- /dev/null +++ b/benchmarks/benchmark_no_xformers.csv @@ -0,0 +1,97 @@ +device,precision,autocast,xformers,runtime,n_samples,latency,memory, +NVIDIA A10,single,FALSE,FALSE,pytorch,1,4.75,6.73 +NVIDIA A10,half,FALSE,FALSE,pytorch,1,2.71,3.43 +NVIDIA A10,single,FALSE,FALSE,pytorch,2,8.75,9 +NVIDIA A10,half,FALSE,FALSE,pytorch,2,4.99,5.53 +NVIDIA A10,single,FALSE,FALSE,pytorch,4,17.18,18.14 +NVIDIA A10,half,FALSE,FALSE,pytorch,4,9.65,6.84 +NVIDIA A10,single,FALSE,FALSE,pytorch,8,-1,-1 +NVIDIA A10,half,FALSE,FALSE,pytorch,8,18.58,12.66 +NVIDIA A10,single,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA A10,half,FALSE,FALSE,pytorch,16,36.32,20.64 +NVIDIA A10,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA A10,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA A10,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA A10,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA A10,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA A10,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,1,1.72,7.76 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,1,1.18,3.41 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,2,3.03,9.04 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,2,1.88,5.53 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,4,5.53,18.04 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,4,3.35,6.74 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,8,10.95,23.85 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,8,6.28,12.6 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,16,12.57,20.58 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA A100-SXM4-40GB,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA A100-SXM4-40GB,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,1,1.99,7.76 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,1,1.5,3.45 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,2,3.52,11.11 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,2,2.3,4.53 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,4,6.31,13.98 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,4,4.04,8.91 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,8,12.21,23.91 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,8,7.59,12.75 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,16,-1,-1 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,16,14.54,21.24 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA A100-PCIE-40GB,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA A100-PCIE-40GB,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,1,2.05,7.76 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,1,1.53,3.41 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,2,3.09,9.04 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,2,3.06,5.53 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,4,6.34,18.04 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,4,4.57,6.74 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,8,11.16,23.85 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,8,7.91,12.6 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,16,22.59,42.63 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,16,14.22,20.58 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,32,44.02,79.6 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,32,27.73,45.19 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,64,-1.0,-1.0 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,64,55.55,79.54 +NVIDIA A100 80GB PCIe,single,False,False,pytorch,128,-1.0,-1.0 +NVIDIA A100 80GB PCIe,half,False,False,pytorch,128,-1.0,-1.0 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,1,4.15,6.76 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,1,2.43,3.42 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,2,6,11.1 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,2,3.88,4.5 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,4,12.85,13.97 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,4,7.77,8.88 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,8,32.69,23.88 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,8,21.21,12.74 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,16,81.14,42.77 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,16,48.49,21.23 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,32,-1,-1 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA RTX A6000,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA RTX A6000,half,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,1,1.73,7.7 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,1,1.06,3.46 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,2,2.66,9.79 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,2,1.73,4.57 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,4,4.47,18.49 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,4,2.63,8.91 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,8,8.16,23.86 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,8,4.97,12.57 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,16,15.98,42.38 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,16,9.61,29.01 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,32,32.04,80.51 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,32,19.07,55.57 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,64,-1,-1 +NVIDIA H100 PCIe,single,FALSE,FALSE,pytorch,128,-1,-1 +NVIDIA H100 PCIe,half,FALSE,FALSE,pytorch,128,-1,-1 diff --git a/docs/benchmark-update.md b/docs/benchmark-update.md index 9ef98a6..b383e01 100644 --- a/docs/benchmark-update.md +++ b/docs/benchmark-update.md @@ -16,7 +16,7 @@ Results will be written to `results.csv`, the benchmark will take different amou ## Results -The current results for the benchmark are available in [`benchmark.csv`](../benchmark.csv). These results were run with Diffusers 0.11.0 and xformers using Ubuntu 20.04, Python 3.8, PyTorch 1.13, CUDA 11.8 ([NGC PyTorch container 22.11](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-22-11.html)). +The current results for the benchmark are available in [`benchmark.csv`](../benchmarks/benchmark.csv). These results were run with Diffusers 0.11.0 and xformers using Ubuntu 20.04, Python 3.8, PyTorch 1.13, CUDA 11.8 ([NGC PyTorch container 22.11](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-22-11.html)). xformers provides a significant boost in performance and memory consumption allowing large batch sizes to maximise utilisation of GPUs. Our best performance comes using NVIDIA A100-SXM4-40GB on [Lambda GPU cloud](https://cloud.lambdalabs.com), at the maximum batch size tested (128) at half precision we observe a throughput of 1.85 images/second when using DDIM 30 steps for sampling. diff --git a/docs/pictures/cost_analysis.png b/docs/pictures/cost_analysis.png new file mode 100644 index 0000000..2b5a473 Binary files /dev/null and b/docs/pictures/cost_analysis.png differ diff --git a/docs/pictures/sd_throughput_mig.png b/docs/pictures/sd_throughput_mig.png new file mode 100644 index 0000000..5e813a1 Binary files /dev/null and b/docs/pictures/sd_throughput_mig.png differ diff --git a/docs/pictures/sd_throughput_noxformer.png b/docs/pictures/sd_throughput_noxformer.png new file mode 100644 index 0000000..baae962 Binary files /dev/null and b/docs/pictures/sd_throughput_noxformer.png differ