GH-133136: Revise QSBR to reduce excess memory held #135473

nascheme · 2025-06-13T15:10:52Z

This is a refinement of GH-135107. Additional changes:

track the size of the mimalloc pages that are deferred
introduce _Py_qsbr_advance_with_size() to reduce duplicated code
adjust the logic of when we advance the global write sequence and when we process the queue of deferred memory
small fix for the goal returned in the advance case, it is safe to return the new global write sequence, not the next write sequence

With these changes, the memory held by QSBR is typically freed a bit more quickly and the process RSS stays a bit smaller.

Regarding the changes to advance and processing, GH-135107 has the following minor issues: if the memory threshold is exceeded when a new item is added, by free_delayed(), we immediately set memory_deferred = 0 and process. It is very unlikely that the goal has been reached for the newly added item. If that's a big chunk of memory, we would have to wait until the next process in order to actually free it. This PR tries to avoid that by storing the seq (local read sequence) as it was at last process time. If that hasn't changed (this thread hasn't entered a quiescent state) then we wait before processing. This at least gives a chance that other readers will catch up and the process can actually free things.

This PR also changes how often we can defer the advance of the global write sequence. Previously, we deferred it up to 10 times. However, I think there is not much benefit to advancing it unless we are nearly ready to process. So, the should_advance_qsbr() is checking if it seems time to process. The _Py_qsbr_should_process() checks if the local read sequence has been updated. That means the write sequence has advanced (it's time to process) and the read sequence for this thread has also advanced. This doesn't tell us that the other threads have advanced their read sequence but we don't want to pay the cost of checking that (would require "poll").

pyperformance memory usage results

Issue: Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

The free threading build uses QSBR to delay the freeing of dictionary keys and list arrays when the objects are accessed by multiple threads in order to allow concurrent reads to proceeed with holding the object lock. The requests are processed in batches to reduce execution overhead, but for large memory blocks this can lead to excess memory usage. Take into account the size of the memory block when deciding when to process QSBR requests.

colesbury and others added 3 commits June 3, 2025 21:29

Fix unused function warning

ce9232b

Re-work QSBR deferred advance and processing.

3978e35

bedevere-app bot mentioned this pull request Jun 13, 2025

Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

Open

nascheme added the topic-free-threading label Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GH-133136: Revise QSBR to reduce excess memory held #135473

GH-133136: Revise QSBR to reduce excess memory held #135473

Uh oh!

nascheme commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

GH-133136: Revise QSBR to reduce excess memory held #135473

Are you sure you want to change the base?

GH-133136: Revise QSBR to reduce excess memory held #135473

Uh oh!

Conversation

nascheme commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nascheme commented Jun 13, 2025 •

edited

Loading