Skip to content

Optimize (?!) in regular expressions #106566

Closed
@serhiy-storchaka

Description

@serhiy-storchaka

Some regular expression engines support (*FAIL) as a pattern which fails to match anything. (?!) is an idiomatic way to write this in engines which do not support (*FAIL).

It works pretty well, but it can be optimized. Instead of compiling it as ASSERT_NOT opcode

>>> re.compile(r'12(?!)|3', re.DEBUG)
BRANCH
  LITERAL 49
  LITERAL 50
  ASSERT_NOT 1
OR
  LITERAL 51

 0. INFO 9 0b100 1 2 (to 10)
      in
 5.     LITERAL 0x31 ('1')
 7.     LITERAL 0x33 ('3')
 9.     FAILURE
10: BRANCH 11 (to 22)
12.   LITERAL 0x31 ('1')
14.   LITERAL 0x32 ('2')
16.   ASSERT_NOT 3 0 (to 20)
19.     SUCCESS
20:   JUMP 7 (to 28)
22: branch 5 (to 27)
23.   LITERAL 0x33 ('3')
25.   JUMP 2 (to 28)
27: FAILURE
28: SUCCESS
re.compile('12(?!)|3', re.DEBUG)

it can be compiled as FAILURE opcode.

>>> re.compile(r'12(?!)|3', re.DEBUG)
BRANCH
  LITERAL 49
  LITERAL 50
  FAILURE
OR
  LITERAL 51

 0. INFO 9 0b100 1 2 (to 10)
      in
 5.     LITERAL 0x31 ('1')
 7.     LITERAL 0x33 ('3')
 9.     FAILURE
10: BRANCH 8 (to 19)
12.   LITERAL 0x31 ('1')
14.   LITERAL 0x32 ('2')
16.   FAILURE
17.   JUMP 7 (to 25)
19: branch 5 (to 24)
20.   LITERAL 0x33 ('3')
22.   JUMP 2 (to 25)
24: FAILURE
25: SUCCESS
re.compile('12(?!)|3', re.DEBUG)

Unfortunately I do not know good examples of using (*FAIL) in regular expressions (without using (*SKIP)) to include them in the documentation. Perhaps other patterns of using (*FAIL) could be optimized future, but I do not know what to optimize.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions