We have implemented a vectorized version of JavaFastPFOR. Based on our benchmark results, it has significant gains over the default, non-vectorized, FastPFOR. The source code and the JMH benchmark results are here.
We'd like to do a PR to this repo but aren't sure how to proceed. Your code doesn't use modules and we also rely on Java's Vector API (which our team is a major contributor to). Please let us know what you think.
Thanks