Description
What version of regex are you using?
v1.10.2
Describe the bug at a high level.
I'm trying to match 2D patterns inside 2D flattened array of u8
s. For example, to match
0
1
on a 1024x1024 array I'm this pattern: "\\u{0}.{1023}\\u{1}"
. I'm compiling with
regex::bytes::RegexBuilder::new(&string)
.unicode(false)
.dot_matches_new_line(true)
.build()
My expectation is that this kind of pattern would be at-worst slightly slower than the naïve alternative that follows:
enum RegexOp {
Match(Vec<u8>),
Skip(usize)
}
fn regex_match(ops: &[RegexOp], mut slice: &[u8]) -> bool {
for op in ops {
match op {
RegexOp::Skip(skip) => slice = &slice[*skip..],
RegexOp::Match(m) => {
if !slice.starts_with(m) {
return false;
}
slice = &slice[m.len()..];
}
}
}
true
}
In this case ops
looks like [Match([1]), Skip(1023), Match([0])]
.
Let me know if this is too much to ask.
Currently the regex solution takes 27.70 seconds for a 256x256 workload according to time
:
________________________________________________________
Executed in 27.70 secs fish external
usr time 27.65 secs 827.00 micros 27.65 secs
sys time 0.60 secs 60.00 micros 0.60 secs
Whereas a 2-line change from is_match
to my regex_match
function drops this down to 147.17 ms (or 517ms if you're taking the longer value, doesn't really matter).
________________________________________________________
Executed in 147.17 millis fish external
usr time 154.46 millis 663.00 micros 153.80 millis
sys time 517.86 millis 183.00 micros 517.68 millis
I can provide proper benchmarks if you want. I am of-course using release mode and the standard set of regex
crate features.
Please let me know if I've made some mistake in RegexBuilder
or if a seperate set of feature flags might improve things.
What are the steps to reproduce the behavior?
I'm happy to write a test program if that's required for this issue to get attention, but I believe it should be trivial to write one given what I've written above.
What is the actual behavior?
N/A
What is the expected behavior?
N/A
What do you expect the output to be?
N/A
Related: #590 (comment).