Skip to content

chore(profiling): per upload config #13535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

taegyunkim
Copy link
Contributor

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@taegyunkim taegyunkim requested a review from a team as a code owner May 29, 2025 17:43
Copy link
Contributor

github-actions bot commented May 29, 2025

CODEOWNERS have been resolved as:

ddtrace/internal/datadog/profiling/dd_wrapper/include/uploader_config.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/uploader_config.cpp   @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/CMakeLists.txt            @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/ddup_interface.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/uploader.hpp      @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/include/uploader_builder.hpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/ddup_interface.cpp    @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/uploader.cpp          @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/src/uploader_builder.cpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/test/test_api.cpp         @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/test/test_forking.cpp     @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/test/test_initialization.cpp  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/test/test_threading.cpp   @DataDog/profiling-python
ddtrace/internal/datadog/profiling/dd_wrapper/test/test_utils.hpp       @DataDog/profiling-python
ddtrace/internal/datadog/profiling/ddup/_ddup.pyi                       @DataDog/profiling-python
ddtrace/internal/datadog/profiling/ddup/_ddup.pyx                       @DataDog/profiling-python
ddtrace/profiling/profiler.py                                           @DataDog/profiling-python
ddtrace/profiling/scheduler.py                                          @DataDog/profiling-python
tests/profiling_v2/collector/test_asyncio.py                            @DataDog/profiling-python
tests/profiling_v2/collector/test_memalloc.py                           @DataDog/profiling-python
tests/profiling_v2/collector/test_stack.py                              @DataDog/profiling-python
tests/profiling_v2/collector/test_threading.py                          @DataDog/profiling-python
tests/profiling_v2/exporter/test_ddup.py                                @DataDog/profiling-python

Copy link
Contributor

github-actions bot commented May 29, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 250 ± 4 ms.

The average import time from base is: 249 ± 4 ms.

The import time difference between this PR and base is: 0.5 ± 0.2 ms.

The difference is not statistically significant (z = 2.92).

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.801 ms (0.72%)
ddtrace.bootstrap.sitecustomize 1.129 ms (0.45%)
ddtrace.bootstrap.preload 1.129 ms (0.45%)
ddtrace.internal.remoteconfig.client 0.598 ms (0.24%)
ddtrace 0.672 ms (0.27%)
ddtrace.internal._unpatched 0.021 ms (0.01%)

@pr-commenter
Copy link

pr-commenter bot commented May 29, 2025

Benchmarks

Benchmark execution time: 2025-06-03 22:48:45

Comparing candidate commit 176c61d in PR branch taegyunkim/pass-upload-config with baseline commit 12791bc in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 505 metrics, 3 unstable metrics.

@taegyunkim taegyunkim added the changelog/no-changelog A changelog entry is not required for this PR. label May 29, 2025
taegyunkim added a commit that referenced this pull request Jun 3, 2025
**tl;dr**
1. Use `uwsgi>=2.0.30` which has
unbit/uwsgi#2726
2. When using `uwsgi<2.0.30` with `--lazy-apps`, also set
`--skip-atexit`, if you see crashes when workers exit.

-----------------------------------------------------

The following explanation happens on `uwsgi<2.0.30`

When `uwsgi` is configured with `--lazy-apps`, master doesn't load the
application, and simply `fork()` to create workers. Then, the workers
load the application. This is different from when it's not set. Without
`--lazy-apps`, master process first load the application then `fork()`.

dd-trace-py profiler uses Python stdlib `atexit` module to export the
profile when the process exits. The code run to export references global
static `std::string` variables. However, when `--lazy-apps` option is
set, they are destructed before when the profiler needs them to export
the profile, and this leads to an undefined behavior.

When `--lazy-apps` is not set the following sequence of events happens
1. uwsgi master process, initialize Python plugin, loads the
application.
3. The application calls `import ddtrace.profiling.auto` and this leads
all native extensions imported to the Python app. During the process,
the constructors for global static variables are called and their
destructors would be registered to be called by `atexit()` when the
program exits.
4. The application also registers `Profiler.stop()` function to `uwsgi`
python module's `atexit` attribute.
5. Then `uwsgi` calls `atexit(uwsgi_plugins_atexit)` to register each
plugins atexit callbacks, for Python `uwsgi_python_atexit()` function.
And the function will call `Py_Finalize()`
6. Master `fork()`s workers
7. When SIGTERM is received at the master, it sends SIGINT to its
workers
8. In each worker, `uwsgi_python_atexit()` is called, invoking
`Profiler.stop()` and `Py_Finalize()` is called, then destructors for
global static variables are called.

When `--lazy-apps` is set
1. uwsgi matser process still intializes Python interpreter, but skips
loading the application
3. Then `uwsgi` calls `atexit(uwsgi_plugins_atexit)` to register each
plugins atexit callbacks, for Python `uwsgi_python_atexit()` function.
And the function will call `Py_Finalize()`
4. Master `fork()`s workers
5. Workers load the application. The application calls `import
ddtrace.profiling.auto` and this leads all native extensions imported to
the Python app. During the process, the constructors for global static
variables are called and their destructors would be registered to be
called by C `atexit()` when the program exits.
6. When `--lazy-apps` is set, ddtrace simply uses Python's `atexit`
module to call `Profiler.stop()` when the program exits.
7. When SIGTERM is received at the master, it sends SIGINT to its
workers
8. In each worker, since global destructors are registered later than
`uwsgi_python_atexit()` , they're called first.
9. Then `Py_Finalize()` is called in part of `uwsgi_python_atexit()`
calling `Profiler.stop()`. Then it will access those destructed global
static strings leading to undefined behavior.

To workaround this I have tried three different approaches. 

1. #11125: Simply pass the needed `std::string` from Python side to
native when exporting pprof file for test
2. #13520: Leak the strings needed for exporting profile to prevent them
from destructed
3. #13535: Bundle all needed strings needed for exporting profile, and
create them whenever we try to upload.

The first approach only addresses issues from tests.

The second approach mimics what's suggested in
pytorch/pytorch#42125, but since there are a
few other global static variables we have throughout the profiler, this
still showed crashes from test.

And we got similar results from the third approach, as we did from the
second.


Another way to completely work around this problem is setting
`--skip-atexit` as suggested in
ansible/ansible-ai-connect-service#248.

When `--skip-atexit` is set, `uwsgi` uses `_exit()` instead of `exit()`,
and the former doesn't call any functions registered with `atexit()` or
`on_exit()`. So, the static variable destructors are never called, and
the app no longer crashes.

The profiler also doesn't flush the profile via `atexit()` when
`--skip-atexit` is set, but I'd argue losing a single profile at the end
is better than crashing the application even if the application is
exiting. So we'd need to require our customers to set `--skip-atexit`
when `--lazy-apps` is set.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant