Skip to content

Potential SegFault with multithreading garbage collection. #101975

Closed
@gaogaotiantian

Description

@gaogaotiantian

For now, I can only occationally observe the segfault on github actions. This is an issue that's not easy to reproduce, but I tried to understand the cause of it.

The direct cause would be in deduce_unreachable in gcmodule.c. In that function, gc tries to find cycles by traversing objects, including frame, which uses _PyFrame_Traverse for all its objects. In _PyFrame_Traverse, it uses frame->stacktop as the index range for all the locals and temporary data on stack(not sure if that's on purpose). However, frame->stacktop is not updated in real-time, which means the object it traverses might not be valid.

For example, in FOR_ITER dispatch code, there's a Py_DECREF(iter); STACK_SHRINK(1); when the iterator is exhausted. However, STACK_SHIRNK only increases stack_pointer, not frame->stacktop. At this point, the iter that's just freed will be traversed during garbage collection.

There might be something I missed because it's not trivial to reproduce this, but I got a demo that could reproduce this issue occasionally.

from multiprocessing import Pool
import sys

def tracefunc(frame, *args):
    a = 100 ** 100


def pool_worker(item):
    return {"a": 1}


def pool_indexer(path):
    item_count = 0
    with Pool(processes=8) as pool:
        for i in pool.imap(pool_worker, range(1, 2000), chunksize=10):
            item_count = item_count + 1


sys.setprofile(tracefunc)
pool_indexer(10)

It might have something to do with the profile function, I think I can only reproduce this with it. You need to enable --with-address-sanitizer to find an error of ERROR: AddressSanitizer: heap-use-after-free on address. Normally in Py_TYPE Include/object.h:135, where the code dereferenced ob, which could be freed already.

The memory it points to is often benign so I'm not able to reliably generate SegFaults, but in theory, this is a memory violation.

Python Version: cpython/main
OS Version: Ubuntu 20 on WSL

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)type-crashA hard crash of the interpreter, possibly with a core dump

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions