gh-91146: More reduce allocation size of list from str.split/rsplit #95493

corona10 · 2022-07-31T04:50:19Z

Memory size

AS-IS `50b2261`


>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
88
>>> s = "12345".split()
>>> sys.getsizeof(s)
104
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
136
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

TO-BE (Allocation is reduced & No regression)

>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
80
>>> s = "12345".split()
>>> sys.getsizeof(s)
88
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
104
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

Performance: No regression

Script

import pyperf

def bench_split_tiny():
    s = "1 2".split()

def bench_split_small():
    s = "1 2 3 4 5".split()

def bench_split_larger():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()

def bench_split_tiny_append():
    s = "1 2".split()
    for e in range(100):
        s.append("9")

def bench_split_small_append():
    s = "1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

def bench_split_larger_append():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

runner = pyperf.Runner()
runner.bench_func('bench_split_tiny', bench_split_tiny)
runner.bench_func('bench_split_small', bench_split_small)
runner.bench_func('bench_split_larger', bench_split_larger)
runner.bench_func('bench_split_tiny_append', bench_split_tiny_append)
runner.bench_func('bench_split_small_append', bench_split_small_append)
runner.bench_func('bench_split_larger_append', bench_split_larger_append)

Result

Benchmark	origin (before `50b2261`)	`50b2261`	PR
bench_split_small	63.3 ns	62.5 ns: 1.01x faster	62.5 ns: 1.01x faster
bench_split_larger	164 ns	166 ns: 1.01x slower	not significant
bench_split_small_append	1.61 us	1.56 us: 1.03x faster	not significant
Geometric mean	(ref)	1.00x faster	1.00x faster

Benchmark hidden because not significant (3): bench_split_tiny, bench_split_tiny_append, bench_split_larger_append

Issue: Surprising list overallocation from .split() #91146

…plit Co-authored-by: Inada Naoki <[email protected]>

corona10 · 2022-07-31T04:55:57Z

@methane san,

I adopt your suggestion from #95473 (comment).
When considering that focusing on memory usage is more important, there is no performance impact with this patch when comparing the performance before 50b2261.
IMO, your suggestion is more proper for this issue :)

summary:

Memory usage win
No performance impact

PTAL.

Objects/unicodeobject.c

corona10 · 2022-07-31T11:54:03Z

@methane san
Thanks for the sharp code review!

bedevere-bot added the awaiting core review label Jul 31, 2022

corona10 requested a review from methane July 31, 2022 04:50

corona10 added skip news and removed skip news labels Jul 31, 2022

corona10 and others added 2 commits July 31, 2022 13:52

pythongh-91146: More reduce allocation size of list from str.split/rs…

48ad552

…plit Co-authored-by: Inada Naoki <[email protected]>

Update NEWS.d

605e0cf

corona10 force-pushed the gh-91146-opt branch from 957cb6b to 605e0cf Compare July 31, 2022 04:52

corona10 changed the title ~~gh-91146: Morer educe allocation size of list from str.split/rsplit~~ gh-91146: More reduce allocation size of list from str.split/rsplit Jul 31, 2022

corona10 added the DO-NOT-MERGE label Jul 31, 2022

This comment was marked as resolved.

Sign in to view

Handle devide by zero

cab2a61

corona10 force-pushed the gh-91146-opt branch from ce48f99 to cab2a61 Compare July 31, 2022 05:25

corona10 removed the DO-NOT-MERGE label Jul 31, 2022

Add comment for len2 == 0

2ebc918

corona10 force-pushed the gh-91146-opt branch from 03cb7c8 to 2ebc918 Compare July 31, 2022 06:06

methane reviewed Jul 31, 2022

View reviewed changes

Objects/unicodeobject.c Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Objects/unicodeobject.c Show resolved Hide resolved

Avoid overflow error

b5edcd8

corona10 changed the title ~~gh-91146: More reduce allocation size of list from str.split/rsplit~~ [WIP] gh-91146: More reduce allocation size of list from str.split/rsplit Jul 31, 2022

Handle overflow case

7f248ef

corona10 changed the title ~~[WIP] gh-91146: More reduce allocation size of list from str.split/rsplit~~ gh-91146: More reduce allocation size of list from str.split/rsplit Jul 31, 2022

corona10 requested a review from methane July 31, 2022 11:39

methane approved these changes Aug 1, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Aug 1, 2022

corona10 merged commit fb75d01 into python:main Aug 1, 2022

bedevere-bot removed the awaiting merge label Aug 1, 2022

corona10 deleted the gh-91146-opt branch August 1, 2022 13:15

corona10 added the performance Performance or resource usage label Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-91146: More reduce allocation size of list from str.split/rsplit #95493

gh-91146: More reduce allocation size of list from str.split/rsplit #95493

Uh oh!

corona10 commented Jul 31, 2022 •

edited

Loading

Uh oh!

corona10 commented Jul 31, 2022

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

corona10 commented Jul 31, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gh-91146: More reduce allocation size of list from str.split/rsplit #95493

gh-91146: More reduce allocation size of list from str.split/rsplit #95493

Uh oh!

Conversation

corona10 commented Jul 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Memory size

AS-IS 50b2261

TO-BE (Allocation is reduced & No regression)

Performance: No regression

Script

Result

Uh oh!

corona10 commented Jul 31, 2022

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

corona10 commented Jul 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

corona10 commented Jul 31, 2022 •

edited

Loading

AS-IS `50b2261`

corona10 commented Jul 31, 2022 •

edited

Loading