Skip to content

gh-91146: More reduce allocation size of list from str.split/rsplit #95493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 1, 2022

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Jul 31, 2022

Memory size

AS-IS 50b2261


>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
88
>>> s = "12345".split()
>>> sys.getsizeof(s)
104
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
136
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

TO-BE (Allocation is reduced & No regression)

>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
80
>>> s = "12345".split()
>>> sys.getsizeof(s)
88
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
104
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

Performance: No regression

Script

import pyperf

def bench_split_tiny():
    s = "1 2".split()

def bench_split_small():
    s = "1 2 3 4 5".split()

def bench_split_larger():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()

def bench_split_tiny_append():
    s = "1 2".split()
    for e in range(100):
        s.append("9")

def bench_split_small_append():
    s = "1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

def bench_split_larger_append():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

runner = pyperf.Runner()
runner.bench_func('bench_split_tiny', bench_split_tiny)
runner.bench_func('bench_split_small', bench_split_small)
runner.bench_func('bench_split_larger', bench_split_larger)
runner.bench_func('bench_split_tiny_append', bench_split_tiny_append)
runner.bench_func('bench_split_small_append', bench_split_small_append)
runner.bench_func('bench_split_larger_append', bench_split_larger_append)

Result

Benchmark origin (before 50b2261) 50b2261 PR
bench_split_small 63.3 ns 62.5 ns: 1.01x faster 62.5 ns: 1.01x faster
bench_split_larger 164 ns 166 ns: 1.01x slower not significant
bench_split_small_append 1.61 us 1.56 us: 1.03x faster not significant
Geometric mean (ref) 1.00x faster 1.00x faster

Benchmark hidden because not significant (3): bench_split_tiny, bench_split_tiny_append, bench_split_larger_append

@corona10
Copy link
Member Author

@methane san,

I adopt your suggestion from #95473 (comment).
When considering that focusing on memory usage is more important, there is no performance impact with this patch when comparing the performance before 50b2261.
IMO, your suggestion is more proper for this issue :)

summary:

  1. Memory usage win
  2. No performance impact

PTAL.

@corona10 corona10 changed the title gh-91146: Morer educe allocation size of list from str.split/rsplit gh-91146: More reduce allocation size of list from str.split/rsplit Jul 31, 2022
@corona10

This comment was marked as resolved.

@corona10 corona10 changed the title gh-91146: More reduce allocation size of list from str.split/rsplit [WIP] gh-91146: More reduce allocation size of list from str.split/rsplit Jul 31, 2022
@corona10 corona10 changed the title [WIP] gh-91146: More reduce allocation size of list from str.split/rsplit gh-91146: More reduce allocation size of list from str.split/rsplit Jul 31, 2022
@corona10 corona10 requested a review from methane July 31, 2022 11:39
@corona10
Copy link
Member Author

corona10 commented Jul 31, 2022

@methane san
Thanks for the sharp code review!

@corona10 corona10 merged commit fb75d01 into python:main Aug 1, 2022
@corona10 corona10 deleted the gh-91146-opt branch August 1, 2022 13:15
@corona10 corona10 added the performance Performance or resource usage label Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants