Skip to content

Change default compression algorithm for cache from gzip to Zstd #523

@bertptrs

Description

@bertptrs

As part of optimising our workflow, we try to store build results in the cache so we can reuse them in later builds. However, over time, as builds grow, restoring the build cache slowly comes to dominate the build process (as everything else is cached)

We found a potential improvement that could could significantly improve throughput without that many code changes: exchange (p)gzip for Zstd.

For testing, I used a cache artefact for one of our Rust-based builds, which contains the target/ dir and all of the objects within.

  • Archive size shrunk from 2.8GB to 2.4GB
  • Archive list time (measured as tar -tf) shrunk from 51 to 18 seconds. This is basically a sequential scan through the file, so it should take roughly as long as decompression in the limit. It just doesn't write.
  • Introducing multi-threading gave a 2x improvement, reducing it to 9s, which is less than I expected
  • Unpacking times followed a similar pattern, taking 1s longer than just listing for both gz (52s) and zstd (19s) because they do the same additional work
  • Similarly, introducing multi-threaded decompression takes 10s, one second longer than it takes to simply list the archive
  • Total achieved write speed is 1GB/s. I don't know if you can do meaningfully better than that without CoW.

Would you consider implementing this? From what we can tell, it should be a relatively straightforward patch, as the current pgzip library you use has an equivalent zstd library with a similar API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions