Skip to content

Segfault during deallocation of _elementtree.XMLParser #111784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mgorny opened this issue Nov 6, 2023 · 16 comments
Closed

Segfault during deallocation of _elementtree.XMLParser #111784

mgorny opened this issue Nov 6, 2023 · 16 comments
Labels
3.12 only security fixes 3.13 bugs and security fixes extension-modules C modules in the Modules dir release-blocker topic-XML type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@mgorny
Copy link
Contributor

mgorny commented Nov 6, 2023

Crash report

What happened?

I've originally hit it while running slixmpp's test suite. I've been able to reduce it to the following program:

import asyncio
import xml.etree.ElementTree as ET
from typing import Optional


def set_see_other_host() -> Optional[ET.Element]:
    pass


class XMLStream:
    def disconnect(self):
        asyncio.ensure_future(self._end_stream_wait())

    async def _end_stream_wait(self):
        await asyncio.wait_for(asyncio.Future(), 1)

    def _remove_schedules(self):
        pass


parser = ET.XMLPullParser()

xmpp = XMLStream()
foo = xmpp._remove_schedules
xmpp.disconnect()
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.Queue().join())

I've been able to reproduce this with 3.12.0, and the tips of 3.12 and main branches. I've been able to bisect it to the following commit:

commit 1b5a2b085c28d230c9ef9bd9b472afc85e087ced
Author:     Kumar Aditya <[email protected]>
AuthorDate: 2023-05-17 01:35:07 +0200
Commit:     GitHub <[email protected]>
CommitDate: 2023-05-17 01:35:07 +0200

    GH-103092: isolate `_elementtree` (#104561)

Backtrace:

(gdb) bt
#0  0x00007f9da2d8611c in xmlparser_gc_clear (self=0x7f9da2617760) at ./Modules/_elementtree.c:3784
#1  0x00007f9da2d86857 in xmlparser_dealloc (self=0x7f9da2617760) at ./Modules/_elementtree.c:3809
#2  0x0000559a2edd2145 in _Py_Dealloc (op=0x7f9da2617760) at Objects/object.c:2858
#3  0x0000559a2edae16d in Py_DECREF (op=0x7f9da2617760) at ./Include/object.h:879
#4  Py_XDECREF (op=0x7f9da2617760) at ./Include/object.h:972
#5  0x0000559a2edbbadb in _PyObject_FreeInstanceAttributes (self=0x7f9da3146d20) at Objects/dictobject.c:5646
#6  0x0000559a2ee03a71 in subtype_dealloc (self=0x7f9da3146d20) at Objects/typeobject.c:2064
#7  0x0000559a2edd2145 in _Py_Dealloc (op=0x7f9da3146d20) at Objects/object.c:2858
#8  0x0000559a2edae16d in Py_DECREF (op=0x7f9da3146d20) at ./Include/object.h:879
#9  Py_XDECREF (op=0x7f9da3146d20) at ./Include/object.h:972
#10 0x0000559a2edafbef in free_keys_object (interp=0x559a2f2a73a8 <_PyRuntime+76872>, keys=0x7f9da26c9ac0) at Objects/dictobject.c:675
#11 0x0000559a2edaef84 in dictkeys_decref (interp=0x559a2f2a73a8 <_PyRuntime+76872>, dk=0x7f9da26c9ac0) at Objects/dictobject.c:335
#12 0x0000559a2edb31e2 in PyDict_Clear (op=0x7f9da322ba00) at Objects/dictobject.c:2120
#13 0x0000559a2edb7a1e in dict_tp_clear (op=0x7f9da322ba00) at Objects/dictobject.c:3588
#14 0x0000559a2efb69ec in delete_garbage (tstate=0x559a2f30ce28 <_PyRuntime+493256>, gcstate=0x559a2f2a7798 <_PyRuntime+77880>, 
    collectable=0x7ffffccdee70, old=0x559a2f2a77e0 <_PyRuntime+77952>) at Modules/gcmodule.c:1033
#15 0x0000559a2efb707c in gc_collect_main (tstate=0x559a2f30ce28 <_PyRuntime+493256>, generation=2, n_collected=0x0, 
    n_uncollectable=0x0, nofail=1) at Modules/gcmodule.c:1313
#16 0x0000559a2efb8a1e in _PyGC_CollectNoFail (tstate=0x559a2f30ce28 <_PyRuntime+493256>) at Modules/gcmodule.c:2154
#17 0x0000559a2ef74d80 in finalize_modules (tstate=0x559a2f30ce28 <_PyRuntime+493256>) at Python/pylifecycle.c:1677
#18 0x0000559a2ef751cb in Py_FinalizeEx () at Python/pylifecycle.c:1931
#19 0x0000559a2efb4ced in Py_RunMain () at Modules/main.c:709
#20 0x0000559a2efb4d9f in pymain_main (args=0x7ffffccdefd0) at Modules/main.c:737
#21 0x0000559a2efb4e5f in Py_BytesMain (argc=2, argv=0x7ffffccdf138) at Modules/main.c:761
#22 0x0000559a2eccc885 in main (argc=2, argv=0x7ffffccdf138) at ./Programs/python.c:15

CC @kumaraditya303

CPython versions tested on:

3.12, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.0a1+ (heads/main:ba8aa1fd37, Nov 6 2023, 16:26:31) [GCC 13.2.1 20231014]

Linked PRs

@mgorny mgorny added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 6, 2023
@mdboom
Copy link
Contributor

mdboom commented Nov 6, 2023

Cc: @kumaraditya303

@Eclips4 Eclips4 added the extension-modules C modules in the Modules dir label Nov 6, 2023
@chgnrdv
Copy link
Contributor

chgnrdv commented Nov 6, 2023

Reduced example, just in case:

import _asyncio
import _elementtree

async def _end_stream_wait():
    await "foo"

parser = _elementtree.XMLParser()
_asyncio.Task(_end_stream_wait())

@Eclips4
Copy link
Member

Eclips4 commented Nov 7, 2023

The problem is more complicated than it may seem at first glance.
First of all, problem is unrelated to isolation of _elementtree. In fact, here's the first bad commit - b652d40.

What's actually happening and why this code segfaults?

  1. XMLParser is trying to deallocate himself:

cpython/Modules/_elementtree.c

Lines 3777 to 3785 in 931f443

static int
xmlparser_gc_clear(XMLParserObject *self)
{
elementtreestate *st = self->state;
if (self->parser != NULL) {
XML_Parser parser = self->parser;
self->parser = NULL;
EXPAT(st, ParserFree)(parser);
}

Line 3784 (which actually means (st->expat_capi->ParserFree)(parser)) leads to segfault if the second point (which described below) already executed.
2. pyexpat_capsule trying to deallocate himself (st->expat_capi of _elementtree is pointing at this data!)

Where's the problem? Order of execution of two points described above is not strict. Second point can be executed before first ponit and vice-versa (which is okay for us). Situation when pyexpat_capsule deallocated before deallocation of XMLParser leads to segmentation fault.

cc @vstinner

@Eclips4 Eclips4 added 3.12 only security fixes 3.13 bugs and security fixes labels Nov 7, 2023
@Eclips4 Eclips4 changed the title Segfault following "GH-103092: isolate _elementtree (#104561)" Segfault during deallocation of _elementtree.XMLParser Nov 7, 2023
@vstinner
Copy link
Member

vstinner commented Nov 7, 2023

_elementtree extension should make sure that the pyexpat extension remains usable until it is done using it.

struct PyExpat_CAPI *expat_capi; of elementtreestate is just a pointer, it doesn't impact lifetime of the capsule memory or the pyexpat extension.

@Eclips4
Copy link
Member

Eclips4 commented Nov 10, 2023

_elementtree extension should make sure that the pyexpat extension remains usable until it is done using it.

struct PyExpat_CAPI *expat_capi; of elementtreestate is just a pointer, it doesn't impact lifetime of the capsule memory or the pyexpat extension.

Yes, that is what we should do. But I don't understand how we can make sure that the pyexpat extension remains usable until _elementtree is done using it.

@vstinner
Copy link
Member

vstinner commented Nov 10, 2023

Can we keep a strong reference to the pyexpat module? Is it enough?

@Eclips4
Copy link
Member

Eclips4 commented Nov 10, 2023

Can we keep a strong reference to the pyexat module? Is it enough?

There's no such API, and yes, it will be enough.
Imported capsule doesn't store any information about module which is bounded to.

@Eclips4
Copy link
Member

Eclips4 commented Nov 10, 2023

To be honest, I don't see any solution without changing some code of capsule's implementation..

@vstinner
Copy link
Member

To be honest, I don't see any solution without changing some code of capsule's implementation..

That's acceptable and has been done recently: see commit 513c89d.

@Eclips4
Copy link
Member

Eclips4 commented Nov 14, 2023

To be honest, I don't see any solution without changing some code of capsule's implementation..

That's acceptable and has been done recently: see commit 513c89d.

I've write a draft PR, please take a look at the #112053.

@Eclips4
Copy link
Member

Eclips4 commented Dec 2, 2023

That's a kinda hard issue 😄
First of all, there's a two segfaults, first is described above (#111784 (comment))
Then, let's talk about second segfault. It's funny, but code segfaults at the same place! I mean EXPAT(st, ParserFree)(parser).

Why does it happens?
1._elementtree module is trying to deallocate, so it's call module_dealloc in moduleobject.c

There's a one interesting thing:

    if (m->md_state != NULL)
        PyMem_Free(m->md_state);

It's "clears" the state of module, if it's isn't freed previously. That's incorrect, because there is no guarantee that module doesn't use state to do some cleanup things in his deallocators.
Ok, state is cleared, let's move on.
2. After that, elementtree_clear is called (or elementtree_free, maybe it doesnt matter because elementtree_free calls elementtree_clear)
3. The paragraph above executes the Py_CLEAR(st->XMLParser_Type);
4. During the execution of previous step, it's calls the xmlparser_dealloc which calls the xmlparser_gc_clear.
5. Finally, xmlparser_gc_clear calls the EXPAT(st, ParserFree)(parser) (reminder, it's interaction with the state).
But... state is already cleaned (now it's points to 0xDDDDDDDDDDDDDDDD which mean that this pointer already freed) in first step! So, there's a segfault.

I'll take an opportunity to add here release-blocker label, because it's can happen in real user code. Do not forget, this segfault is present in 3.12, so in my opinion we should fix it until the next version of 3.12 is released.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Dec 4, 2023

That's really strange. IIRC, an extension module should never need to free it's allocated state explicitly. That sounds like a serious bug to me; module state should be freed by the runtime. (I haven't looked at the code yet.) cc. @vstinner @encukou

Eclips4 added a commit to Eclips4/cpython that referenced this issue Dec 22, 2023
@Eclips4
Copy link
Member

Eclips4 commented Dec 22, 2023

I wrote a PR, please check #113405

@Eclips4
Copy link
Member

Eclips4 commented Dec 22, 2023

That's really strange. IIRC, an extension module should never need to free it's allocated state explicitly. That sounds like a serious bug to me; module state should be freed by the runtime. (I haven't looked at the code yet.) cc. @vstinner @encukou

Seems you misunderstand me. Extension module doesn't tries to free his state. module_dealloc (I mean tp_dealloc slot of module object which located in moduleobject.c) called and frees the state. Why is it called - seems that there no references to module any longer, and this deallocate the state, which wrong, because some instances of XMLParser need this state.

serhiy-storchaka pushed a commit that referenced this issue Dec 24, 2023
First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 24, 2023
…H-113405)

First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
(cherry picked from commit 894f0e5)

Co-authored-by: Kirill Podoprigora <[email protected]>
serhiy-storchaka pushed a commit that referenced this issue Dec 24, 2023
) (GH-113446)

First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
(cherry picked from commit 894f0e5)

Co-authored-by: Kirill Podoprigora <[email protected]>
@Eclips4
Copy link
Member

Eclips4 commented Dec 24, 2023

Thanks @mgorny for the report!

@mgorny
Copy link
Contributor Author

mgorny commented Dec 24, 2023

Thanks a lot!

ryan-duve pushed a commit to ryan-duve/cpython that referenced this issue Dec 26, 2023
…H-113405)

First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
kulikjak pushed a commit to kulikjak/cpython that referenced this issue Jan 22, 2024
…H-113405)

First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…H-113405)

First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…H-113405)

First fix resolve situation when pyexpat module (which contains expat_CAPI
capsule) deallocates before _elementtree, so we need to hold a strong
reference to pyexpat module to.

Second fix resolve situation when module state is deallocated before
deallocation of XMLParser instances, which uses module state to clear
some stuff.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes extension-modules C modules in the Modules dir release-blocker topic-XML type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

7 participants