Skip to content

Materialized instance dictionaries are not GC-tracked #133543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
emontnemery opened this issue May 6, 2025 · 6 comments
Closed

Materialized instance dictionaries are not GC-tracked #133543

emontnemery opened this issue May 6, 2025 · 6 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@emontnemery
Copy link
Contributor

emontnemery commented May 6, 2025

Bug report

Bug description:

Objects can't be garbage collected after accessing __dict__. The issue was noticed on a class with many cached_property which store the cache entries by directly accessing the instance __dict__.

If cached_property is modified by removing this line, the issue goes away:
https://github.com/python/cpython/blob/3.13/Lib/functools.py#L1028

The attached reproduction creates instances of two classes and checks if they are freed or not.

Reproduction of the issue with the attached pytest test cases:

mkdir cached_property_issue
python3 -m venv .venv
source .venv/bin/activate
pip3 install pytest
pytest test_cached_property_issue.py
from __future__ import annotations

from collections.abc import Generator
from functools import cached_property
import gc
from typing import Any, Self
import weakref

import pytest

hass_instances: list[weakref.ref[HackHomeAssistant]] = []
sensor_instances: list[weakref.ReferenceType[HackEntity]] = []


@pytest.fixture(autouse=True, scope="module")
def garbage_collection() -> Generator[None]:
    """Run garbage collection and check instances."""
    yield
    gc.collect()
    assert [bool(obj()) for obj in hass_instances] == [False, True]
    assert [bool(obj()) for obj in sensor_instances] == [False, True]


class HackEntity:
    """A class with many properties."""

    entity_id: str | None = None
    platform = None
    registry_entry = None
    _unsub_device_updates = None

    @cached_property
    def should_poll(self) -> bool:
        return False

    @cached_property
    def unique_id(self) -> None:
        return None

    @cached_property
    def has_entity_name(self) -> bool:
        return False

    @cached_property
    def _device_class_name(self) -> None:
        return None

    @cached_property
    def _unit_of_measurement_translation_key(self) -> None:
        return None

    @property
    def name(self) -> None:
        return None

    @cached_property
    def capability_attributes(self) -> None:
        return None

    @cached_property
    def state_attributes(self) -> None:
        return None

    @cached_property
    def extra_state_attributes(self) -> None:
        return None

    @cached_property
    def device_class(self) -> None:
        return None

    @cached_property
    def unit_of_measurement(self) -> None:
        return None

    @cached_property
    def icon(self) -> None:
        return None

    @cached_property
    def entity_picture(self) -> None:
        return None

    @cached_property
    def available(self) -> bool:
        return True

    @cached_property
    def assumed_state(self) -> bool:
        return False

    @cached_property
    def force_update(self) -> bool:
        return False

    @cached_property
    def supported_features(self) -> None:
        return None

    @cached_property
    def attribution(self) -> None:
        return None

    @cached_property
    def translation_key(self) -> None:
        return None

    @cached_property
    def options(self) -> None:
        return None

    @cached_property
    def state_class(self) -> None:
        return None

    @cached_property
    def native_unit_of_measurement(self) -> None:
        return None

    @cached_property
    def suggested_unit_of_measurement(self) -> None:
        return None

    @property
    def state(self) -> Any:
        self._unit_of_measurement_translation_key
        self.native_unit_of_measurement
        self.options
        self.state_class
        self.suggested_unit_of_measurement
        self.translation_key
        return None

    def async_write_ha_state(self) -> None:
        self._verified_state_writable = True
        self.capability_attributes

        self.available
        self.state
        self.extra_state_attributes
        self.state_attributes

        self.unit_of_measurement
        self.assumed_state
        self.attribution
        self.device_class
        self.entity_picture
        self.icon
        self._device_class_name
        self.has_entity_name
        self.supported_features
        self.force_update

    def async_on_remove(self) -> None:
        self._on_remove = []

    def add_to_platform_start(
        self, hass: HackHomeAssistant, platform: HackEntityPlatform
    ) -> None:
        self.hass = hass
        self.platform = platform


class CompensationSensor(HackEntity):
    """This concrete class won't be garbage collected."""

    def __init__(self) -> None:
        """Initialize the Compensation sensor."""
        self.__dict__
        sensor_instances.append(weakref.ref(self))
        self.parallel_updates = None
        self._platform_state = None

        self._state_info = {}
        self.async_write_ha_state()


class CompensationSensor2(HackEntity):
    """This concrete class will be garbage collected."""

    def __init__(self) -> None:
        """Initialize the Compensation sensor."""
        sensor_instances.append(weakref.ref(self))
        self.parallel_updates = None
        self._platform_state = None

        self._state_info = {}
        self.async_write_ha_state()


class HackHomeAssistant:
    """Root object of the Home Assistant home automation."""

    def __new__(cls) -> Self:
        """Set the _hass thread local data."""
        hass = super().__new__(cls)
        hass_instances.append(weakref.ref(hass))
        return hass


class HackEntityPlatform:
    """Manage the entities for a single platform."""

    def __init__(
        self,
        hass: HackHomeAssistant,
    ) -> None:
        """Initialize the entity platform."""
        self.hass = hass
        self.entities: dict[str, HackEntity] = {}

    def async_add_entity(
        self,
        new_entity: HackEntity,
    ) -> None:
        """Add entities for a single platform async."""

        entity = new_entity
        entity.add_to_platform_start(self.hass, self)

        entity.unique_id

        entity.entity_id = "sensor.test"
        entity_id = entity.entity_id
        self.entities[entity_id] = entity

        entity.async_on_remove()

        entity.should_poll


def test_limits1() -> None:
    """Create an object which will be garbage collected."""
    hass = HackHomeAssistant()
    sens = CompensationSensor2()
    entity_platform = HackEntityPlatform(hass)
    entity_platform.async_add_entity(sens)


def test_limits2() -> None:
    """Create an object which won't be garbage collected."""
    hass = HackHomeAssistant()
    sens = CompensationSensor()
    entity_platform = HackEntityPlatform(hass)
    entity_platform.async_add_entity(sens)


def test_limits3() -> None:
    """Last test, garbage collection check runs after this test."""

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

@emontnemery emontnemery added the type-bug An unexpected behavior, bug, or error label May 6, 2025
@brandtbucher
Copy link
Member

brandtbucher commented May 6, 2025

Thanks for the report. Do you have a smaller, simpler reproducer? Ideally fewer than 10 lines or so, with no pytest dependency?

@brandtbucher
Copy link
Member

Minimal reproducer:

import gc
import weakref

class C:
    pass

o = C()

# Comment out this line to "fix" the leak:
o.__dict__

# Set exactly 30 attributes, one of which is a cycle:
o.cycle = o
for i in range(29):
    setattr(o, f"attr_{i}", None)

# o is only reachable through the weakref:
ref = weakref.ref(o)
del o

gc.collect()

assert ref() is None, "o is leaked!"

Confirmed leak on 3.13, works fine on 3.12 and 3.14. Probably related to lazy __dict__ materialization.

@brandtbucher
Copy link
Member

Ah, this is super subtle.

When the dictionary is materialized with o.__dict__, it is untracked (since it does not contain any GC tracked values). However, it's not re-tracked on the following line, when the cycle is inserted. That's the bug.

@brandtbucher
Copy link
Member

brandtbucher commented May 7, 2025

store_instance_attr_lock_held does not maintain tracking for materialized instance dictionaries. 30 attributes are needed to overflow the inline values, which puts the dict in charge of traversing itself (with fewer attrs, the object just visits its own values and skips the keys... see PyObject_VisitManagedDict).

This patch for the 3.13 branch fixes the reproducer:

diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 680d6beb760..932be96827f 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -6838,6 +6838,7 @@ store_instance_attr_lock_held(PyObject *obj, PyDictValues *values,
                                    value == NULL ? PyDict_EVENT_DELETED :
                                    PyDict_EVENT_MODIFIED);
         _PyDict_NotifyEvent(interp, event, dict, name, value);
+        MAINTAIN_TRACKING(dict, name, value);
     }
 
     FT_ATOMIC_STORE_PTR_RELEASE(values->values[ix], Py_XNewRef(value));

I haven't looked into why 3.14 is fine yet.

@brandtbucher
Copy link
Member

I haven't looked into why 3.14 is fine yet.

All dictionaries are tracked now: GH-127027

@emontnemery
Copy link
Contributor Author

store_instance_attr_lock_held does not maintain tracking for materialized instance dictionaries. 30 attributes are needed to overflow the inline values, which puts the dict in charge of traversing itself (with fewer attrs, the object just visits its own values and skips the keys... see PyObject_VisitManagedDict).

This patch for the 3.13 branch fixes the reproducer:
...

I haven't looked into why 3.14 is fine yet.

Amazing! I'm sorry about the very large reproducer.

@ZeroIntensity ZeroIntensity added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.13 bugs and security fixes labels May 7, 2025
@brandtbucher brandtbucher changed the title Objects can't be garbage collected after accessing __dict__ Materialized instance dictionaries are not GC-tracked May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants