-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
As part of optimising our workflow, we try to store build results in the cache so we can reuse them in later builds. However, over time, as builds grow, restoring the build cache slowly comes to dominate the build process (as everything else is cached)
We found a potential improvement that could could significantly improve throughput without that many code changes: exchange (p)gzip for Zstd.
For testing, I used a cache artefact for one of our Rust-based builds, which contains the target/ dir and all of the objects within.
- Archive size shrunk from 2.8GB to 2.4GB
- Archive list time (measured as tar -tf) shrunk from 51 to 18 seconds. This is basically a sequential scan through the file, so it should take roughly as long as decompression in the limit. It just doesn't write.
- Introducing multi-threading gave a 2x improvement, reducing it to 9s, which is less than I expected
- Unpacking times followed a similar pattern, taking 1s longer than just listing for both
gz(52s) andzstd(19s) because they do the same additional work - Similarly, introducing multi-threaded decompression takes 10s, one second longer than it takes to simply list the archive
- Total achieved write speed is 1GB/s. I don't know if you can do meaningfully better than that without CoW.
Would you consider implementing this? From what we can tell, it should be a relatively straightforward patch, as the current pgzip library you use has an equivalent zstd library with a similar API.
Riscky and ReinierMaas
Metadata
Metadata
Assignees
Labels
No labels