-
Couldn't load subscription status.
- Fork 314
Add a PoC for OTel process context support #9472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
|
🎯 Code Coverage 🔗 Commit SHA: b15d899 | Docs | Was this helpful? Give us feedback! |
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 47 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.046 s) : 0, 1046176
Total [baseline] (10.77 s) : 0, 10770398
Agent [candidate] (1.046 s) : 0, 1046031
Total [candidate] (10.718 s) : 0, 10718463
section appsec
Agent [baseline] (1.232 s) : 0, 1232211
Total [baseline] (10.886 s) : 0, 10886183
Agent [candidate] (1.222 s) : 0, 1221732
Total [candidate] (10.764 s) : 0, 10763775
section iast
Agent [baseline] (1.183 s) : 0, 1183407
Total [baseline] (10.959 s) : 0, 10958523
Agent [candidate] (1.183 s) : 0, 1182795
Total [candidate] (11.007 s) : 0, 11006556
section profiling
Agent [baseline] (1.2 s) : 0, 1199562
Total [baseline] (10.877 s) : 0, 10877289
Agent [candidate] (1.203 s) : 0, 1203138
Total [candidate] (10.907 s) : 0, 10907336
gantt
title petclinic - break down per module: candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.462 ms) : 0, 1462
crashtracking [candidate] (1.447 ms) : 0, 1447
BytebuddyAgent [baseline] (732.576 ms) : 0, 732576
BytebuddyAgent [candidate] (732.939 ms) : 0, 732939
GlobalTracer [baseline] (242.353 ms) : 0, 242353
GlobalTracer [candidate] (241.492 ms) : 0, 241492
AppSec [baseline] (30.242 ms) : 0, 30242
AppSec [candidate] (30.06 ms) : 0, 30060
Debugger [baseline] (6.073 ms) : 0, 6073
Debugger [candidate] (6.065 ms) : 0, 6065
Remote Config [baseline] (691.503 µs) : 0, 692
Remote Config [candidate] (673.273 µs) : 0, 673
Telemetry [baseline] (11.58 ms) : 0, 11580
Telemetry [candidate] (12.141 ms) : 0, 12141
section appsec
crashtracking [baseline] (1.473 ms) : 0, 1473
crashtracking [candidate] (1.449 ms) : 0, 1449
BytebuddyAgent [baseline] (761.02 ms) : 0, 761020
BytebuddyAgent [candidate] (756.185 ms) : 0, 756185
GlobalTracer [baseline] (237.091 ms) : 0, 237091
GlobalTracer [candidate] (233.643 ms) : 0, 233643
AppSec [baseline] (170.321 ms) : 0, 170321
AppSec [candidate] (169.078 ms) : 0, 169078
Debugger [baseline] (8.051 ms) : 0, 8051
Debugger [candidate] (6.453 ms) : 0, 6453
Remote Config [baseline] (631.275 µs) : 0, 631
Remote Config [candidate] (627.269 µs) : 0, 627
Telemetry [baseline] (8.544 ms) : 0, 8544
Telemetry [candidate] (9.978 ms) : 0, 9978
IAST [baseline] (23.864 ms) : 0, 23864
IAST [candidate] (23.252 ms) : 0, 23252
section iast
crashtracking [baseline] (1.451 ms) : 0, 1451
crashtracking [candidate] (1.461 ms) : 0, 1461
BytebuddyAgent [baseline] (853.217 ms) : 0, 853217
BytebuddyAgent [candidate] (854.211 ms) : 0, 854211
GlobalTracer [baseline] (234.296 ms) : 0, 234296
GlobalTracer [candidate] (232.806 ms) : 0, 232806
AppSec [baseline] (28.642 ms) : 0, 28642
AppSec [candidate] (28.038 ms) : 0, 28038
Debugger [baseline] (5.832 ms) : 0, 5832
Debugger [candidate] (9.26 ms) : 0, 9260
Remote Config [baseline] (607.501 µs) : 0, 608
Remote Config [candidate] (601.039 µs) : 0, 601
Telemetry [baseline] (8.416 ms) : 0, 8416
Telemetry [candidate] (8.159 ms) : 0, 8159
IAST [baseline] (29.769 ms) : 0, 29769
IAST [candidate] (26.967 ms) : 0, 26967
section profiling
crashtracking [baseline] (1.44 ms) : 0, 1440
crashtracking [candidate] (1.453 ms) : 0, 1453
BytebuddyAgent [baseline] (762.556 ms) : 0, 762556
BytebuddyAgent [candidate] (766.393 ms) : 0, 766393
GlobalTracer [baseline] (222.573 ms) : 0, 222573
GlobalTracer [candidate] (221.446 ms) : 0, 221446
AppSec [baseline] (30.609 ms) : 0, 30609
AppSec [candidate] (30.377 ms) : 0, 30377
Debugger [baseline] (6.283 ms) : 0, 6283
Debugger [candidate] (6.341 ms) : 0, 6341
Remote Config [baseline] (708.247 µs) : 0, 708
Remote Config [candidate] (692.09 µs) : 0, 692
Telemetry [baseline] (15.603 ms) : 0, 15603
Telemetry [candidate] (16.26 ms) : 0, 16260
ProfilingAgent [baseline] (108.974 ms) : 0, 108974
ProfilingAgent [candidate] (109.06 ms) : 0, 109060
Profiling [baseline] (109.615 ms) : 0, 109615
Profiling [candidate] (109.663 ms) : 0, 109663
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.052 s) : 0, 1051902
Total [baseline] (8.706 s) : 0, 8706124
Agent [candidate] (1.062 s) : 0, 1061503
Total [candidate] (8.673 s) : 0, 8673052
section iast
Agent [baseline] (1.184 s) : 0, 1184361
Total [baseline] (9.388 s) : 0, 9387907
Agent [candidate] (1.191 s) : 0, 1190884
Total [candidate] (9.329 s) : 0, 9329066
gantt
title insecure-bank - break down per module: candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.477 ms) : 0, 1477
crashtracking [candidate] (1.466 ms) : 0, 1466
BytebuddyAgent [baseline] (734.92 ms) : 0, 734920
BytebuddyAgent [candidate] (743.879 ms) : 0, 743879
GlobalTracer [baseline] (243.88 ms) : 0, 243880
GlobalTracer [candidate] (245.333 ms) : 0, 245333
AppSec [baseline] (30.383 ms) : 0, 30383
AppSec [candidate] (30.86 ms) : 0, 30860
Debugger [baseline] (6.14 ms) : 0, 6140
Debugger [candidate] (6.187 ms) : 0, 6187
Remote Config [baseline] (713.728 µs) : 0, 714
Remote Config [candidate] (696.538 µs) : 0, 697
Telemetry [baseline] (13.144 ms) : 0, 13144
Telemetry [candidate] (11.784 ms) : 0, 11784
section iast
crashtracking [baseline] (1.48 ms) : 0, 1480
crashtracking [candidate] (1.475 ms) : 0, 1475
BytebuddyAgent [baseline] (854.348 ms) : 0, 854348
BytebuddyAgent [candidate] (861.986 ms) : 0, 861986
GlobalTracer [baseline] (233.612 ms) : 0, 233612
GlobalTracer [candidate] (233.566 ms) : 0, 233566
AppSec [baseline] (26.524 ms) : 0, 26524
AppSec [candidate] (28.056 ms) : 0, 28056
Debugger [baseline] (8.528 ms) : 0, 8528
Debugger [candidate] (5.835 ms) : 0, 5835
Remote Config [baseline] (604.64 µs) : 0, 605
Remote Config [candidate] (623.692 µs) : 0, 624
Telemetry [baseline] (8.322 ms) : 0, 8322
Telemetry [candidate] (8.364 ms) : 0, 8364
IAST [baseline] (29.761 ms) : 0, 29761
IAST [candidate] (29.667 ms) : 0, 29667
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 5 performance regressions! Performance is the same for 4 metrics, 12 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section baseline
no_agent (38.639 ms) : 38337, 38940
. : milestone, 38639,
appsec (46.757 ms) : 46354, 47161
. : milestone, 46757,
code_origins (42.946 ms) : 42575, 43317
. : milestone, 42946,
iast (45.763 ms) : 45366, 46160
. : milestone, 45763,
profiling (47.015 ms) : 46592, 47437
. : milestone, 47015,
tracing (44.421 ms) : 44041, 44801
. : milestone, 44421,
section candidate
no_agent (37.273 ms) : 36977, 37569
. : milestone, 37273,
appsec (49.153 ms) : 48714, 49591
. : milestone, 49153,
code_origins (45.08 ms) : 44681, 45478
. : milestone, 45080,
iast (45.657 ms) : 45273, 46042
. : milestone, 45657,
profiling (46.642 ms) : 46223, 47061
. : milestone, 46642,
tracing (45.061 ms) : 44678, 45443
. : milestone, 45061,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section baseline
no_agent (4.263 ms) : 4215, 4311
. : milestone, 4263,
iast (9.601 ms) : 9439, 9764
. : milestone, 9601,
iast_FULL (13.935 ms) : 13660, 14210
. : milestone, 13935,
iast_GLOBAL (10.152 ms) : 9977, 10326
. : milestone, 10152,
profiling (9.07 ms) : 8928, 9213
. : milestone, 9070,
tracing (7.334 ms) : 7231, 7437
. : milestone, 7334,
section candidate
no_agent (4.521 ms) : 4469, 4572
. : milestone, 4521,
iast (9.14 ms) : 8992, 9288
. : milestone, 9140,
iast_FULL (13.869 ms) : 13598, 14141
. : milestone, 13869,
iast_GLOBAL (10.565 ms) : 10378, 10752
. : milestone, 10565,
profiling (8.723 ms) : 8592, 8853
. : milestone, 8723,
tracing (7.612 ms) : 7500, 7724
. : milestone, 7612,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section baseline
no_agent (15.273 s) : 15273000, 15273000
. : milestone, 15273000,
appsec (14.827 s) : 14827000, 14827000
. : milestone, 14827000,
iast (18.657 s) : 18657000, 18657000
. : milestone, 18657000,
iast_GLOBAL (18.061 s) : 18061000, 18061000
. : milestone, 18061000,
profiling (15.392 s) : 15392000, 15392000
. : milestone, 15392000,
tracing (14.728 s) : 14728000, 14728000
. : milestone, 14728000,
section candidate
no_agent (15.53 s) : 15530000, 15530000
. : milestone, 15530000,
appsec (14.816 s) : 14816000, 14816000
. : milestone, 14816000,
iast (18.506 s) : 18506000, 18506000
. : milestone, 18506000,
iast_GLOBAL (18.319 s) : 18319000, 18319000
. : milestone, 18319000,
profiling (15.519 s) : 15519000, 15519000
. : milestone, 15519000,
tracing (14.913 s) : 14913000, 14913000
. : milestone, 14913000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.54.0-SNAPSHOT~b15d899edc, baseline=1.54.0-SNAPSHOT~fa788acdc2
dateFormat X
axisFormat %s
section baseline
no_agent (1.474 ms) : 1462, 1485
. : milestone, 1474,
appsec (2.461 ms) : 2409, 2513
. : milestone, 2461,
iast (2.197 ms) : 2135, 2259
. : milestone, 2197,
iast_GLOBAL (2.245 ms) : 2182, 2307
. : milestone, 2245,
profiling (2.06 ms) : 2009, 2111
. : milestone, 2060,
tracing (2.01 ms) : 1962, 2058
. : milestone, 2010,
section candidate
no_agent (1.475 ms) : 1464, 1486
. : milestone, 1475,
appsec (3.664 ms) : 3449, 3879
. : milestone, 3664,
iast (2.201 ms) : 2138, 2264
. : milestone, 2201,
iast_GLOBAL (2.248 ms) : 2185, 2311
. : milestone, 2248,
profiling (2.044 ms) : 1994, 2095
. : milestone, 2044,
tracing (2.016 ms) : 1967, 2065
. : milestone, 2016,
|
| private static OTelContextHolder initOtelContext() { | ||
| ConfigProvider configProvider = ConfigProvider.getInstance(); | ||
| AtomicReference<Throwable> reasonNotLoaded = new AtomicReference<>(); | ||
| OTelContext otelContext = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ question: Did they name it using the same name / semantic than their Context API? Aren't we be confused with the existing datadog.opentelemetry.shim.context.OtelContext from the Context API (or even datadog.opentelemetry.shim.trace.OtelSpanContext / datadog.trace.instrumentation.opentelemetry.OtelSpanContext from the Tracing API)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question.
There is no explicit naming anywhere, AFAICT
Go went with 'process context' (DataDog/dd-trace-go#3937) - but, the long-term plan is to support both process-level as well as thread-level context transfer to the ebpf profiler. So, I decided to use OTelContext as the wrapper for accessing the native functionality related to that concept. Only later I found the OtelContext shim :(
This is all still in progress - if this is confusing we can change the epbf context propagator native wrapper class name to something else, before making this GA.
| if (configProvider.getBoolean( | ||
| ProfilingConfig.PROFILING_PROCESS_CONTEXT_ENABLED, | ||
| ProfilingConfig.PROFILING_PROCESS_CONTEXT_ENABLED_DEFAULT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ question: Is there a reason to use ConfigProvider instead of creating an entry into Config for the new PROFILING_PROCESS_CONTEXT_ENABLED? Is that because it's an experimental flag only that won't stay for long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's the main reason.
Also, Config is a kind of abomination in its current form :) Is there anything particularly important I am missing by using the ConfigProvider directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried answering the same question when looking at the flare reporter & came to the conclusion that anything profiling-specific should be dumped in ProfilingConfig, while features that could/should affect other components are better off in Config. I think given the experimental nature & profiling-specific func., we should keep it here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything particularly important I am missing by using the ConfigProvider directly?
Not really. Telemetry and config inversion will still work using ConfigProvider.
It might be a bit harder (not even sure) for newcomers to find out where is config is setup if it's handled on the side.
We really failed at providing useful / helpful config API so that's totally fine as it is right now. Maybe at some point we should advocate for product to create their own Config when it does not interact with the others.
| // Register the profiler flare before we start the profiling system, but early during the | ||
| // profiler lifecycle | ||
| ProfilerFlareReporter.register(); | ||
| ProcessContext.register(configProvider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to report any issues with registration via the flare or telemetry similar to what's currently in the exception handling blocks? I see the registration method itself does some warn-level logging, the extended reporting could live there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will add the logging once the upstream is finalized.
As it is now, this is a very raw PoC that is disabled by default, the config key is not published anywhere and it directly spells 'experimental', so whoever enables this are on their own.
Once this becomes a product feature the logs should be treated as a part of initialization and saved for profiler flare.
|
Tiny comment: Is it possible to enable the setting via env var, or only via the CLI arg? (Asking as enabling via env var can be a bit easier on the reliability environment and the like) |
We are using the standard config mechanism which allows both, system properties and env variable, using the same conversion convention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved - if this were not an experimental feature, I think it would be nice to create a smoke test (similar to the flare reporter). I don't think it should be a blocker for this iteration, the ProcessContextTest with mocking is sufficient.
What Does This Do
Adds a PoC capability to expose the process context information to OTel profiler.
This is disabled by default, being a PoC, and needs to be manually enabled by setting
-Ddd.profiling.experimental.process_context.enabled=trueMotivation
Try out the integration of the process context with OTel profiler.
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any usefull labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: PROF-12377