Instrument native threads #308

zhengyu123 · 2025-12-09T20:40:29Z

What does this PR do?:
Instrument/profile native threads, the threads that are created/started outside of JVM, on hotspot/non-musl based JVM.

Motivation:
Enhance Java profiler to profiler native threads.

Additional Notes:
The feature is now only enabled for hotspot based JVM running on non-musl Linux platform. The reasons:

A crash seen on aarch64/musl /JDK11. It might not be related to this change, but it is hard to confirm. Disable the feature for musl based Linux for now.
J9 has issues to walk native only thread stack in release build, it shows only
.no_java_frame
while debug build shows correct stack.

How to test the change?:

Regular tests
JDK tier1 tests with profiler agent. Although, there are failures, but match java-profiler main line.
There are failures that are expected:
- HeapMonitor uses agent, which conflicts with profiler agent
- Compiler frame is not compatible with agent.

For Datadog employees:

If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.
JIRA: PROF-11577

Unsure? Have a question? Request a review!

ddprof-lib/src/main/cpp/ctimer_linux.cpp

jbachorik

Having the libraries patching tied to ctimer (CPU profiling) does not sound right.
We have a bunch of other engines that can be used either for CPU or wallclock profiling and their functionality would be very inconsistent.

I think the library patching should go to a more generic place - and also be called from a more generic place, like eg. profiler.cpp

zhengyu123 · 2025-12-12T13:21:01Z

Having the libraries patching tied to ctimer (CPU profiling) does not sound right. We have a bunch of other engines that can be used either for CPU or wallclock profiling and their functionality would be very inconsistent.

I think the library patching should go to a more generic place - and also be called from a more generic place, like eg. profiler.cpp

Agree. But I want to limit the changes in this PR and address this issue in separate PR.

jbachorik · 2025-12-12T13:38:38Z

Having the libraries patching tied to ctimer (CPU profiling) does not sound right. We have a bunch of other engines that can be used either for CPU or wallclock profiling and their functionality would be very inconsistent.

I think the library patching should go to a more generic place - and also be called from a more generic place, like eg. profiler.cpp

Agree. But I want to limit the changes in this PR and address this issue in separate PR.

Ok. But create a followup ticket for that, plz. And let's move to the proper placement asap - we really don't want to get this partial task accidentally released.

jbachorik · 2025-12-12T13:39:37Z

How will this work with dynamically loaded libraries? Am I reading the code right that we will not patch those?

jbachorik

Approved with the two followup tickets

pr-commenter · 2025-12-12T18:05:56Z

Benchmarks [x86_64 wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	wall	wall
wall	on	on

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 14 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:fj-kmeans	worse [+487.823ms; +628.177ms] or [+2.095%; +2.698%]	unstable [-240.876MB; +356.547MB] or [-23.161%; +34.283%]
scenario:renaissance:gauss-mix	worse [+751.908ms; +964.092ms] or [+4.165%; +5.340%]	unstable [-391.951MB; +506.412MB] or [-33.122%; +42.795%]

pr-commenter · 2025-12-12T18:06:19Z

Benchmarks [x86_64 cpu]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu	cpu
wall	off	off

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 12 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:page-rank	worse [+0.843s; +1.681s] or [+1.722%; +3.436%]	unstable [-125.381MB; +284.402MB] or [-8.547%; +19.387%]
scenario:renaissance:fj-kmeans	worse [+391.875ms; +504.125ms] or [+1.672%; +2.151%]	unstable [-241.979MB; +363.119MB] or [-23.002%; +34.517%]
scenario:renaissance:gauss-mix	worse [+650.645ms; +941.355ms] or [+3.586%; +5.188%]	unstable [-396.052MB; +507.154MB] or [-33.263%; +42.594%]

pr-commenter · 2025-12-12T18:06:25Z

Benchmarks [x86_64 memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak	memleak
wall	off	off

Summary

Found 0 performance improvements and 1 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:gauss-mix	worse [+588.434ms; +879.566ms] or [+3.231%; +4.830%]	unstable [-395.580MB; +506.232MB] or [-33.256%; +42.559%]

pr-commenter · 2025-12-12T18:06:34Z

Benchmarks [x86_64 cpu,wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu,wall	cpu,wall
wall	on	on

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 12 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:finagle-http	worse [+582.695ms; +869.305ms] or [+2.189%; +3.266%]	unstable [-260.992MB; +377.187MB] or [-19.077%; +27.571%]
scenario:renaissance:dec-tree	worse [+777.105ms; +978.895ms] or [+2.494%; +3.142%]	unstable [-249.420MB; +341.512MB] or [-17.151%; +23.483%]
scenario:renaissance:gauss-mix	worse [+720.755ms; +827.245ms] or [+3.963%; +4.548%]	unstable [-402.906MB; +501.871MB] or [-33.702%; +41.980%]

pr-commenter · 2025-12-12T18:06:48Z

Benchmarks [x86_64 alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	alloc	alloc
wall	off	off

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 13 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:scala-kmeans	worse [+421.652ms; +1478.348ms] or [+1.824%; +6.396%]	unstable [-227.228MB; +342.267MB] or [-22.934%; +34.546%]
scenario:renaissance:gauss-mix	worse [+720.616ms; +839.384ms] or [+3.964%; +4.618%]	unstable [-402.863MB; +503.579MB] or [-33.641%; +42.051%]

pr-commenter · 2025-12-12T18:06:51Z

Benchmarks [aarch64 wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	wall	wall
wall	on	on

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 15 metrics, 20 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:future-genetic	worse [+478.924ms; +745.076ms] or [+3.194%; +4.968%]	unstable [-244.748MB; +563.809MB] or [-28.885%; +66.540%]
scenario:renaissance:fj-kmeans	worse [+473.709ms; +1266.291ms] or [+2.260%; +6.043%]	unstable [-242.451MB; +347.750MB] or [-23.644%; +33.912%]
scenario:renaissance:scala-kmeans	worse [+404.605ms; +703.395ms] or [+1.681%; +2.922%]	unstable [-224.543MB; +337.127MB] or [-23.069%; +34.635%]

pr-commenter · 2025-12-12T18:07:20Z

Benchmarks [aarch64 cpu]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu	cpu
wall	off	off

Summary

Found 0 performance improvements and 1 performance regressions! Performance is the same for 15 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:future-genetic	worse [+576.550ms; +703.450ms] or [+3.852%; +4.700%]	unstable [-251.781MB; +561.880MB] or [-29.458%; +65.738%]

pr-commenter · 2025-12-12T18:07:37Z

Benchmarks [x86_64 cpu,wall,alloc,memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	cpu,wall,alloc,memleak	cpu,wall,alloc,memleak
wall	on	on

Summary

Found 0 performance improvements and 4 performance regressions! Performance is the same for 13 metrics, 21 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:finagle-http	worse [+799.445ms; +1064.555ms] or [+3.027%; +4.031%]	unstable [-278.796MB; +360.943MB] or [-20.209%; +26.163%]
scenario:renaissance:future-genetic	worse [+337.593ms; +1230.407ms] or [+2.128%; +7.755%]	unstable [-309.646MB; +415.627MB] or [-31.776%; +42.652%]
scenario:renaissance:par-mnemonics	worse [+705.599ms; +1162.401ms] or [+2.719%; +4.479%]	unstable [-196.383MB; +310.634MB] or [-18.090%; +28.615%]
scenario:renaissance:gauss-mix	worse [+680.559ms; +907.441ms] or [+3.747%; +4.996%]	unstable [-401.064MB; +505.016MB] or [-33.545%; +42.239%]

pr-commenter · 2025-12-12T18:08:22Z

Benchmarks [x86_64 memleak,alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak,alloc	memleak,alloc
wall	off	off

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 13 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:scala-kmeans	worse [+0.410s; +1.890s] or [+1.789%; +8.250%]	unstable [-230.178MB; +339.651MB] or [-23.184%; +34.211%]
scenario:renaissance:gauss-mix	worse [+734.479ms; +909.521ms] or [+4.051%; +5.017%]	unstable [-397.246MB; +506.827MB] or [-33.333%; +42.528%]

pr-commenter · 2025-12-12T18:08:58Z

Benchmarks [aarch64 memleak,alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak,alloc	memleak,alloc
wall	off	off

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 13 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:future-genetic	worse [+473.625ms; +622.375ms] or [+3.145%; +4.133%]	unstable [-244.830MB; +564.720MB] or [-28.860%; +66.569%]
scenario:renaissance:chi-square	worse [+742.960ms; +1249.040ms] or [+4.732%; +7.955%]	unstable [-354.940MB; +502.860MB] or [-32.480%; +46.015%]
scenario:renaissance:naive-bayes	worse [+477.032ms; +1366.968ms] or [+3.253%; +9.322%]	unstable [-293.509MB; +662.374MB] or [-30.402%; +68.610%]

pr-commenter · 2025-12-12T18:09:11Z

Benchmarks [aarch64 alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	alloc	alloc
wall	off	off

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 13 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:future-genetic	worse [+396.862ms; +639.138ms] or [+2.630%; +4.236%]	unstable [-263.453MB; +526.252MB] or [-29.943%; +59.812%]
scenario:renaissance:chi-square	worse [+684.520ms; +1007.480ms] or [+4.324%; +6.364%]	unstable [-362.298MB; +474.362MB] or [-32.708%; +42.826%]
scenario:renaissance:naive-bayes	worse [+751.380ms; +980.620ms] or [+5.054%; +6.596%]	unstable [-260.480MB; +672.238MB] or [-27.154%; +70.077%]

pr-commenter · 2025-12-12T18:09:33Z

Benchmarks [aarch64 cpu,wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu,wall	cpu,wall
wall	on	on

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 16 metrics, 20 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:future-genetic	worse [+423.380ms; +652.620ms] or [+2.808%; +4.328%]	unstable [-276.464MB; +492.085MB] or [-30.600%; +54.466%]
scenario:renaissance:fj-kmeans	worse [+328.081ms; +675.919ms] or [+1.549%; +3.191%]	unstable [-243.479MB; +353.832MB] or [-23.478%; +34.120%]

pr-commenter · 2025-12-12T18:09:58Z

Benchmarks [aarch64 memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak	memleak
wall	off	off

Summary

Found 0 performance improvements and 3 performance regressions! Performance is the same for 13 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:finagle-http	worse [+596.344ms; +747.656ms] or [+1.896%; +2.378%]	unstable [-211.229MB; +337.623MB] or [-15.248%; +24.372%]
scenario:renaissance:future-genetic	worse [+502.708ms; +757.292ms] or [+3.354%; +5.053%]	unstable [-267.706MB; +522.957MB] or [-30.333%; +59.256%]
scenario:renaissance:chi-square	worse [+766.182ms; +1069.818ms] or [+4.838%; +6.755%]	unstable [-376.191MB; +461.550MB] or [-33.695%; +41.340%]

pr-commenter · 2025-12-12T18:10:03Z

Benchmarks [aarch64 cpu,wall,alloc,memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-zgu_inst_native_thread-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	cpu,wall,alloc,memleak	cpu,wall,alloc,memleak
wall	on	on

Summary

Found 0 performance improvements and 4 performance regressions! Performance is the same for 12 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:future-genetic	worse [+500.612ms; +571.388ms] or [+3.319%; +3.788%]	unstable [-250.896MB; +563.909MB] or [-29.328%; +65.916%]
scenario:renaissance:chi-square	worse [+653.677ms; +854.323ms] or [+4.090%; +5.346%]	unstable [-366.047MB; +469.350MB] or [-33.027%; +42.348%]
scenario:renaissance:als	worse [+0.635s; +1.633s] or [+1.674%; +4.307%]	unstable [-187.536MB; +321.887MB] or [-13.006%; +22.324%]
scenario:renaissance:mnemonics	worse [+0.349s; +1.879s] or [+1.607%; +8.638%]	unstable [-240.084MB; +353.434MB] or [-23.316%; +34.324%]

zhengyu123 added 13 commits December 9, 2025 15:06

Instrument native threads

74158e5

Fix

07e480d

J9 stack walk

db6afef

Cleanup

ba40661

Fix bug

3ac6223

Set cstack=dwarf for NativeThreadTest (J9)

0bdded8

Hotspot/Zing only

48354c2

Fix

617ab9a

More fix

b445304

Exclude J9 from NativeThreadTest

77aaedb

Fix

60d8350

Handle musl

8cdabce

Exclude Musl from NativeThreadTest

36fe9b1

zhengyu123 marked this pull request as ready for review December 11, 2025 16:21

zhengyu123 requested review from jbachorik and r1viollet December 11, 2025 16:26

cleanup

5d46345

jbachorik reviewed Dec 11, 2025

View reviewed changes

ddprof-lib/src/main/cpp/ctimer_linux.cpp Show resolved Hide resolved

jbachorik requested changes Dec 11, 2025

View reviewed changes

Remove duplicated check

7774a2c

Merge branch 'main' into zgu/inst_native_thread

826dfd5

DataDog deleted a comment from zhengyu123 Dec 12, 2025

jbachorik approved these changes Dec 12, 2025

View reviewed changes

Instrument native threads #308

Are you sure you want to change the base?

Instrument native threads #308

Uh oh!

Conversation

zhengyu123 commented Dec 9, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jbachorik left a comment

Choose a reason for hiding this comment

Uh oh!

zhengyu123 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbachorik commented Dec 12, 2025

Uh oh!

jbachorik commented Dec 12, 2025

Uh oh!

jbachorik left a comment

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 wall]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 cpu]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 memleak]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 cpu,wall]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [aarch64 wall]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [aarch64 cpu]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 cpu,wall,alloc,memleak]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [x86_64 memleak,alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [aarch64 memleak,alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [aarch64 alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Dec 12, 2025

Benchmarks [aarch64 cpu,wall]

Parameters

zhengyu123 commented Dec 9, 2025 •

edited by atlassian bot

Loading

zhengyu123 commented Dec 12, 2025 •

edited

Loading