Description
For now, I can only occationally observe the segfault on github actions. This is an issue that's not easy to reproduce, but I tried to understand the cause of it.
The direct cause would be in deduce_unreachable
in gcmodule.c
. In that function, gc
tries to find cycles by traversing objects, including frame, which uses _PyFrame_Traverse
for all its objects. In _PyFrame_Traverse
, it uses frame->stacktop
as the index range for all the locals and temporary data on stack(not sure if that's on purpose). However, frame->stacktop
is not updated in real-time, which means the object it traverses might not be valid.
For example, in FOR_ITER
dispatch code, there's a Py_DECREF(iter); STACK_SHRINK(1);
when the iterator is exhausted. However, STACK_SHIRNK
only increases stack_pointer
, not frame->stacktop
. At this point, the iter
that's just freed will be traversed during garbage collection.
There might be something I missed because it's not trivial to reproduce this, but I got a demo that could reproduce this issue occasionally.
from multiprocessing import Pool
import sys
def tracefunc(frame, *args):
a = 100 ** 100
def pool_worker(item):
return {"a": 1}
def pool_indexer(path):
item_count = 0
with Pool(processes=8) as pool:
for i in pool.imap(pool_worker, range(1, 2000), chunksize=10):
item_count = item_count + 1
sys.setprofile(tracefunc)
pool_indexer(10)
It might have something to do with the profile function, I think I can only reproduce this with it. You need to enable --with-address-sanitizer
to find an error of ERROR: AddressSanitizer: heap-use-after-free on address
. Normally in Py_TYPE Include/object.h:135
, where the code dereferenced ob
, which could be freed already.
The memory it points to is often benign so I'm not able to reliably generate SegFaults, but in theory, this is a memory violation.
Python Version: cpython/main
OS Version: Ubuntu 20 on WSL