### Summary I’ve been evaluating the Rust [`blake3`](https://crates.io/crates/blake3) crate for potential use. Data, code, and benchmarks are available [here](https://github.com/device-mapper-utils/blk-archive/issues/45#issuecomment-3255376966). ### Results - **x86:** `blake3` is consistently faster than `blake2`. - **ppcle / s390x / aarch64:** performance is generally slower than `blake2`. - Rayon parallelism sometimes improves results. - In some cases, performance is still worse ([example](https://github.com/device-mapper-utils/blk-archive/issues/45#issuecomment-3255485598)). ### Questions - Is this expected behavior on non-x86 architectures (e.g., SIMD gaps, missing intrinsics)? - Or is my sample code / benchmarking harness flawed? - Are there recommended tuning options or build flags for ppcle, s390x, and aarch64?