Question: Architecture-specific performance differences between blake3 and blake2 #516

Open

opened

Summary

I’ve been evaluating the Rust blake3 crate for potential use. Data, code, and benchmarks are available here.

Results

x86: blake3 is consistently faster than blake2.
ppcle / s390x / aarch64: performance is generally slower than blake2.
- Rayon parallelism sometimes improves results.
- In some cases, performance is still worse (example).

Questions

Is this expected behavior on non-x86 architectures (e.g., SIMD gaps, missing intrinsics)?
Or is my sample code / benchmarking harness flawed?
Are there recommended tuning options or build flags for ppcle, s390x, and aarch64?

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests