SRE ignores the ASCII flag on character ranges with non-BMP upper bound

Bug report

Bug description:

It seems like SRE ignores the ASCII flag when parsing a character range whose upper bound is beyond the BMP region:

>>> import re

# should match
>>> regex = re.compile("[\ua7aa-\uffff]", re.IGNORECASE)
>>> print(regex.match("\u0266"))
<re.Match object; span=(0, 1), match='ɦ'> 

# should not match
>>> regex = re.compile("[\ua7aa-\U00010000]", re.ASCII | re.IGNORECASE)
>>> print(regex.match("\u0266"))
<re.Match object; span=(0, 1), match='ɦ'>

# must be related to case folding, since \ua7aa folds to \u0266
>>> regex = re.compile("[\ua7ab-\U00010000]", re.ASCII | re.IGNORECASE)
>>> print(regex.match("\u0266"))
None

# correct behavior when upper bound is in BMP
>>> regex = re.compile("[\ua7aa-\uffff]", re.ASCII | re.IGNORECASE)
>>> print(regex.match("\u0266"))
None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SRE ignores the ASCII flag on character ranges with non-BMP upper bound #126505

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

SRE ignores the ASCII flag on character ranges with non-BMP upper bound #126505

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions