Closed
Description
Bug report
Bug description:
It seems like SRE ignores the ASCII flag when parsing a character range whose upper bound is beyond the BMP region:
>>> import re
# should match
>>> regex = re.compile("[\ua7aa-\uffff]", re.IGNORECASE)
>>> print(regex.match("\u0266"))
<re.Match object; span=(0, 1), match='ɦ'>
# should not match
>>> regex = re.compile("[\ua7aa-\U00010000]", re.ASCII | re.IGNORECASE)
>>> print(regex.match("\u0266"))
<re.Match object; span=(0, 1), match='ɦ'>
# must be related to case folding, since \ua7aa folds to \u0266
>>> regex = re.compile("[\ua7ab-\U00010000]", re.ASCII | re.IGNORECASE)
>>> print(regex.match("\u0266"))
None
# correct behavior when upper bound is in BMP
>>> regex = re.compile("[\ua7aa-\uffff]", re.ASCII | re.IGNORECASE)
>>> print(regex.match("\u0266"))
None
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
- gh-126505: Do not use Unicode case folding in ASCII regexes #126544
- gh-126505: Fix bugs in compiling case-insensitive character classes #126557
- [3.13] gh-126505: Fix bugs in compiling case-insensitive character classes (GH-126557) #126689
- [3.12] gh-126505: Fix bugs in compiling case-insensitive character classes (GH-126557) #126690