Skip to content

Add support for Python underscores in numeric literals #3038

@marvintensuan

Description

@marvintensuan

Information

  • Language: Python
  • Plugins: none

Description
Python supports underscores in numeric literals (PEP 515). That is, you can use underscores for better readability of code.

>>> 2_021
2021
>>> 20_21
2021
>>> 0b_1111_1100_101
2021
>>> 0o_3745
2021
>>> 0x7_e5
2021

Code snippet

prism-python.js uses regex to look up for numbers.

'number': /(?:\b(?=\d)|\B(?=\.))(?:0[bo])?(?:(?:\d|0x[\da-f])[\da-f]*(?:\.\d*)?|\.\d+)(?:e[+-]?\d+)?j?\b/i,

We can break this down as follows:

/(?:\b(?=\d)|\B(?=\.))
(?:0[bo])?
(?:
    (?:\d|0x[\da-f])
    [\da-f]*
    (?:\.\d*)?  |  \.\d+
)	
(?:e[+-]?\d+)?
j?
\b/i

I would propose the following modifications:

/(?:\b(?=\d)|\B(?=\.))
(?:0[bo](_)?)?
(?:
    (?:\d|0x(_)?[\da-f])
    ([\da-f]|[\da-f]_)*
    (?:\.\d*)?  |  \.\d+  |  ((\d+_)*(\.)?(\d+)?)*
)	
(?:e[+-]?\d+)?
([^_]j)?
\b/i

There are five modifications here:

  • (?:\.\d*)? | \.\d+ --> (?:\.\d*)? | \.\d+ | ((\d+_)*(\.)?(\d+)?)* — to recognize underscores in between numbers.
  • [\da-f]* --> ([\da-f]|[\da-f]_)*; — underscores in hexadecimals
  • (?:0[bo])? --> (?:0[bo](_)?)? — underscores after 0b and 0o, i.e. 0b_0001 and 0o_754
  • (?:\d|0x[\da-f]) --> (?:\d|0x(_)?[\da-f]) — underscore after 0x, i.e. 0x_badface
  • j? --> ([^_]j)? — supress underscores before j. e.g. 4_2j ✔️ 42_j ❌

In one line, it should look like this:

/(?:\b(?=\d)|\B(?=\.))(?:0[bo](_)?)?(?:(?:\d|0x(_)?[\da-f])([\da-f]|[\da-f]_)*(?:\.\d*)?|\.\d+|((\d+_)*(\.)?(\d+)?)*)(?:e[+-]?\d+)?([^_]j)?\b/i

Test page

The code being highlighted incorrectly.
100000 + 2_000
0b1000 + 0b_0011_1111_0100_1110
0x01af + 0xcafe_f00d

EDIT: added more modifications.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions