Skip to content

Conversation

@daniel-mohedano
Copy link
Contributor

@daniel-mohedano daniel-mohedano commented Jul 22, 2025

What Does This Do

Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:

  • Exception Replay is enabled if Failed Test Replay is enabled by the user.
    • This is done through the DebuggerConfigBridge. It handles deferred updates, so no order dependency on startup is introduced between the products that want to use Live Debugger's features.
    • The existing configuration update logic used with Remote Config now also goes through the same system.
  • A new FailedTestReplayExceptionDebugger was created to support the feature. It will:
    • Instrument Errors, which were previously ignored.
    • Ignore the max number of exception per second limit.
    • Ignore the exception capturing cooldown.
    • Apply the instrumentation synchronously. Failed test retries can be performed in rapid succession and the async approach to the instrumentation meant that most of the times the instrumentation was not performed before the next test failure. This has also been added as a separate configuration to support it in regular Exception Replay.
  • Adds a product field to snapshots, populated with test_optimization if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.
  • Removed Live Debugger's dependency on Remote Config being enabled for its configuration to be initialized.
  • Exception Replay now supports Agentless mode. For now this is tied with CiVisibility agentless mode. If DD_CIVISIBILITY_AGENTLESS_ENABLED is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.
  • DebuggerSink now flushes on closing to avoid snapshots not being sent on test session finish.

Additional changes:

  • Refactored BackendApiFactory.Intake to a standalone Intake, given that it is useful in order to compute agentless mode URLs.
  • Updated libraries capabilities to add failed_test_replay in test frameworks that support Auto Test Retries.
  • Other changes related to adding di_enabled to the Settings response and telemetry.

Validation:

  • MavenSmokeTest now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.
  • Implemented JUnitConsoleSmokeTest to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.
  • Both smoke tests also validate:
    • Tests that do not have an Auto Test Retries execution strategy will not have probes installed.
    • Snapshot data is captured for all test retries and not limited to the first one.

Motivation

Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.

Contributor Checklist

Jira ticket: SDTEST-2242

@daniel-mohedano daniel-mohedano added type: enhancement Enhancements and improvements tag: do not merge Do not merge changes comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation labels Jul 22, 2025
@pr-commenter
Copy link

pr-commenter bot commented Jul 23, 2025

Debugger benchmarks

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
ci_job_date 1757404680 1757405026
end_time 2025-09-09T07:59:21 2025-09-09T08:05:07
git_branch master daniel.mohedano/failed-test-replay
git_commit_sha 03d997e ea090f9
start_time 2025-09-09T07:58:01 2025-09-09T08:03:46
See matching parameters
Baseline Candidate
ci_job_id 1119890044 1119890044
ci_pipeline_id 75918137 75918137
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date 1757404047 1757404047

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics.

See unchanged results
scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:noprobe unstable
[-30.841µs; +68.482µs] or [-10.986%; +24.393%]
unstable
[-40.989µs; +83.393µs] or [-12.738%; +25.916%]
unstable
[-47.386µs; +95.714µs] or [-14.103%; +28.486%]
unstable
[-176.081µs; +675.054µs] or [-18.601%; +71.312%]
same
scenario:basic same same same unstable
[-173.206µs; +68.131µs] or [-21.913%; +8.619%]
unstable
[-167.819op/s; +167.819op/s] or [-6.377%; +6.377%]
scenario:loop unsure
[-10.192µs; -2.688µs] or [-0.115%; -0.030%]
unsure
[-14.522µs; -3.025µs] or [-0.162%; -0.034%]
unsure
[-19.409µs; -6.910µs] or [-0.215%; -0.077%]
same same
Request duration reports for reports
gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (321.776 µs) : 289, 354
.   : milestone, 322,
basic (284.368 µs) : 277, 291
.   : milestone, 284,
loop (8.977 ms) : 8972, 8981
.   : milestone, 8977,
section candidate
noprobe (342.978 µs) : 266, 420
.   : milestone, 343,
basic (282.789 µs) : 275, 290
.   : milestone, 283,
loop (8.968 ms) : 8962, 8974
.   : milestone, 8968,
Loading
  • baseline results
Scenario Request median duration [CI 0.99]
noprobe 321.776 µs [289.41 µs, 354.141 µs]
basic 284.368 µs [277.451 µs, 291.285 µs]
loop 8.977 ms [8.972 ms, 8.981 ms]
  • candidate results
Scenario Request median duration [CI 0.99]
noprobe 342.978 µs [265.782 µs, 420.173 µs]
basic 282.789 µs [275.407 µs, 290.172 µs]
loop 8.968 ms [8.962 ms, 8.974 ms]

@pr-commenter
Copy link

pr-commenter bot commented Jul 23, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1757338973 1757404047
git_commit_sha 03d997e ea090f9
release_version 1.54.0-SNAPSHOT~03d997e2fd 1.51.0-SNAPSHOT~ea090f98cf
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1757405942 1757405942
ci_job_id 1119890037 1119890037
ci_pipeline_id 75918137 75918137
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-1esui62f 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-1esui62f 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 1 performance improvements and 3 performance regressions! Performance is the same for 43 metrics, 12 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:tracing:Debugger worse
[+175.285µs; +392.885µs] or [+2.864%; +6.419%]
6.405ms 6.121ms
scenario:startup:petclinic:profiling:Debugger worse
[+242.424µs; +468.041µs] or [+3.819%; +7.374%]
6.703ms 6.347ms
scenario:startup:petclinic:tracing:Debugger worse
[+256.211µs; +447.719µs] or [+4.216%; +7.368%]
6.428ms 6.076ms
scenario:startup:petclinic:tracing:Remote Config better
[-43.479µs; -16.192µs] or [-6.232%; -2.321%]
667.800µs 697.636µs
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1054580
Total [baseline] (8.64 s) : 0, 8640214
Agent [candidate] (1.049 s) : 0, 1048551
Total [candidate] (8.631 s) : 0, 8630751
section iast
Agent [baseline] (1.177 s) : 0, 1177049
Total [baseline] (9.349 s) : 0, 9349431
Agent [candidate] (1.182 s) : 0, 1181600
Total [candidate] (9.389 s) : 0, 9388608
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.055 s -
Agent iast 1.177 s 122.469 ms (11.6%)
Total tracing 8.64 s -
Total iast 9.349 s 709.217 ms (8.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.049 s -
Agent iast 1.182 s 133.049 ms (12.7%)
Total tracing 8.631 s -
Total iast 9.389 s 757.857 ms (8.8%)
gantt
    title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.449 ms) : 0, 1449
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (737.465 ms) : 0, 737465
BytebuddyAgent [candidate] (733.08 ms) : 0, 733080
GlobalTracer [baseline] (243.601 ms) : 0, 243601
GlobalTracer [candidate] (242.713 ms) : 0, 242713
AppSec [baseline] (30.256 ms) : 0, 30256
AppSec [candidate] (30.067 ms) : 0, 30067
Debugger [baseline] (6.121 ms) : 0, 6121
Debugger [candidate] (6.405 ms) : 0, 6405
Remote Config [baseline] (689.796 µs) : 0, 690
Remote Config [candidate] (670.262 µs) : 0, 670
Telemetry [baseline] (13.835 ms) : 0, 13835
Telemetry [candidate] (13.062 ms) : 0, 13062
section iast
crashtracking [baseline] (1.462 ms) : 0, 1462
crashtracking [candidate] (1.47 ms) : 0, 1470
BytebuddyAgent [baseline] (850.215 ms) : 0, 850215
BytebuddyAgent [candidate] (852.491 ms) : 0, 852491
GlobalTracer [baseline] (232.417 ms) : 0, 232417
GlobalTracer [candidate] (233.45 ms) : 0, 233450
IAST [baseline] (30.831 ms) : 0, 30831
IAST [candidate] (29.43 ms) : 0, 29430
AppSec [baseline] (25.967 ms) : 0, 25967
AppSec [candidate] (27.712 ms) : 0, 27712
Debugger [baseline] (6.538 ms) : 0, 6538
Debugger [candidate] (6.991 ms) : 0, 6991
Remote Config [baseline] (595.36 µs) : 0, 595
Remote Config [candidate] (610.877 µs) : 0, 611
Telemetry [baseline] (8.088 ms) : 0, 8088
Telemetry [candidate] (8.35 ms) : 0, 8350
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1050175
Total [baseline] (10.821 s) : 0, 10821338
Agent [candidate] (1.047 s) : 0, 1046833
Total [candidate] (10.703 s) : 0, 10703180
section appsec
Agent [baseline] (1.223 s) : 0, 1223314
Total [baseline] (10.845 s) : 0, 10845189
Agent [candidate] (1.224 s) : 0, 1224142
Total [candidate] (10.832 s) : 0, 10832013
section iast
Agent [baseline] (1.192 s) : 0, 1191539
Total [baseline] (10.937 s) : 0, 10937257
Agent [candidate] (1.184 s) : 0, 1184126
Total [candidate] (10.991 s) : 0, 10991147
section profiling
Agent [baseline] (1.21 s) : 0, 1209751
Total [baseline] (10.952 s) : 0, 10952397
Agent [candidate] (1.204 s) : 0, 1204315
Total [candidate] (11.042 s) : 0, 11041905
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.05 s -
Agent appsec 1.223 s 173.14 ms (16.5%)
Agent iast 1.192 s 141.365 ms (13.5%)
Agent profiling 1.21 s 159.576 ms (15.2%)
Total tracing 10.821 s -
Total appsec 10.845 s 23.85 ms (0.2%)
Total iast 10.937 s 115.919 ms (1.1%)
Total profiling 10.952 s 131.059 ms (1.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.047 s -
Agent appsec 1.224 s 177.308 ms (16.9%)
Agent iast 1.184 s 137.292 ms (13.1%)
Agent profiling 1.204 s 157.482 ms (15.0%)
Total tracing 10.703 s -
Total appsec 10.832 s 128.833 ms (1.2%)
Total iast 10.991 s 287.967 ms (2.7%)
Total profiling 11.042 s 338.726 ms (3.2%)
gantt
    title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.456 ms) : 0, 1456
crashtracking [candidate] (1.444 ms) : 0, 1444
BytebuddyAgent [baseline] (733.365 ms) : 0, 733365
BytebuddyAgent [candidate] (731.88 ms) : 0, 731880
GlobalTracer [baseline] (243.485 ms) : 0, 243485
GlobalTracer [candidate] (242.252 ms) : 0, 242252
AppSec [baseline] (30.147 ms) : 0, 30147
AppSec [candidate] (30.088 ms) : 0, 30088
Debugger [baseline] (6.076 ms) : 0, 6076
Debugger [candidate] (6.428 ms) : 0, 6428
Remote Config [baseline] (697.636 µs) : 0, 698
Remote Config [candidate] (667.8 µs) : 0, 668
Telemetry [baseline] (13.791 ms) : 0, 13791
Telemetry [candidate] (13.037 ms) : 0, 13037
section appsec
crashtracking [baseline] (1.469 ms) : 0, 1469
crashtracking [candidate] (1.454 ms) : 0, 1454
BytebuddyAgent [baseline] (754.446 ms) : 0, 754446
BytebuddyAgent [candidate] (755.543 ms) : 0, 755543
GlobalTracer [baseline] (235.583 ms) : 0, 235583
GlobalTracer [candidate] (235.849 ms) : 0, 235849
IAST [baseline] (23.448 ms) : 0, 23448
IAST [candidate] (23.409 ms) : 0, 23409
AppSec [baseline] (166.065 ms) : 0, 166065
AppSec [candidate] (166.379 ms) : 0, 166379
Debugger [baseline] (11.235 ms) : 0, 11235
Debugger [candidate] (10.604 ms) : 0, 10604
Remote Config [baseline] (624.568 µs) : 0, 625
Remote Config [candidate] (618.183 µs) : 0, 618
Telemetry [baseline] (9.319 ms) : 0, 9319
Telemetry [candidate] (9.272 ms) : 0, 9272
section iast
crashtracking [baseline] (1.465 ms) : 0, 1465
crashtracking [candidate] (1.471 ms) : 0, 1471
BytebuddyAgent [baseline] (860.483 ms) : 0, 860483
BytebuddyAgent [candidate] (853.603 ms) : 0, 853603
GlobalTracer [baseline] (235.031 ms) : 0, 235031
GlobalTracer [candidate] (234.144 ms) : 0, 234144
IAST [baseline] (30.61 ms) : 0, 30610
IAST [candidate] (31.283 ms) : 0, 31283
AppSec [baseline] (26.331 ms) : 0, 26331
AppSec [candidate] (27.167 ms) : 0, 27167
Debugger [baseline] (7.479 ms) : 0, 7479
Debugger [candidate] (6.207 ms) : 0, 6207
Remote Config [baseline] (623.346 µs) : 0, 623
Remote Config [candidate] (622.076 µs) : 0, 622
Telemetry [baseline] (8.234 ms) : 0, 8234
Telemetry [candidate] (8.51 ms) : 0, 8510
section profiling
crashtracking [baseline] (1.442 ms) : 0, 1442
crashtracking [candidate] (1.436 ms) : 0, 1436
BytebuddyAgent [baseline] (769.649 ms) : 0, 769649
BytebuddyAgent [candidate] (764.25 ms) : 0, 764250
GlobalTracer [baseline] (224.462 ms) : 0, 224462
GlobalTracer [candidate] (224.552 ms) : 0, 224552
AppSec [baseline] (30.795 ms) : 0, 30795
AppSec [candidate] (30.685 ms) : 0, 30685
Debugger [baseline] (6.347 ms) : 0, 6347
Debugger [candidate] (6.703 ms) : 0, 6703
Remote Config [baseline] (729.498 µs) : 0, 729
Remote Config [candidate] (698.339 µs) : 0, 698
Telemetry [baseline] (16.709 ms) : 0, 16709
Telemetry [candidate] (16.262 ms) : 0, 16262
ProfilingAgent [baseline] (108.709 ms) : 0, 108709
ProfilingAgent [candidate] (108.991 ms) : 0, 108991
Profiling [baseline] (109.359 ms) : 0, 109359
Profiling [candidate] (109.707 ms) : 0, 109707
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1757338973 1757404047
git_commit_sha 03d997e ea090f9
release_version 1.54.0-SNAPSHOT~03d997e2fd 1.51.0-SNAPSHOT~ea090f98cf
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1757405619 1757405619
ci_job_id 1119890038 1119890038
ci_pipeline_id 75918137 75918137
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-yott8yqk 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-yott8yqk 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 2 performance improvements and 3 performance regressions! Performance is the same for 7 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:insecure-bank:iast_GLOBAL:high_load better
[-891.387µs; -472.754µs] or [-8.198%; -4.348%]
unstable
[-24.648op/s; +81.273op/s] or [-5.764%; +19.007%]
10.191ms 455.906op/s 10.873ms 427.594op/s
scenario:load:petclinic:profiling:high_load better
[-2.150ms; -1.174ms] or [-4.304%; -2.349%]
unstable
[-3.522op/s; +9.972op/s] or [-3.759%; +10.644%]
48.289ms 96.912op/s 49.951ms 93.688op/s
scenario:load:petclinic:appsec:high_load worse
[+1.071ms; +2.021ms] or [+2.217%; +4.181%]
unstable
[-9.750op/s; +3.725op/s] or [-10.067%; +3.846%]
49.874ms 93.838op/s 48.328ms 96.850op/s
scenario:load:petclinic:code_origins:high_load worse
[+1.213ms; +2.068ms] or [+2.676%; +4.562%]
unstable
[-10.548op/s; +3.348op/s] or [-10.220%; +3.244%]
46.976ms 99.612op/s 45.335ms 103.213op/s
scenario:load:petclinic:iast:high_load worse
[+1.760ms; +2.617ms] or [+3.961%; +5.891%]
unstable
[-12.215op/s; +2.315op/s] or [-11.597%; +2.198%]
46.616ms 100.375op/s 44.427ms 105.325op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.257 ms) : 36954, 37560
.   : milestone, 37257,
appsec (48.328 ms) : 47897, 48759
.   : milestone, 48328,
code_origins (45.335 ms) : 44947, 45724
.   : milestone, 45335,
iast (44.427 ms) : 44035, 44819
.   : milestone, 44427,
profiling (49.951 ms) : 49497, 50405
.   : milestone, 49951,
tracing (43.713 ms) : 43347, 44079
.   : milestone, 43713,
section candidate
no_agent (37.801 ms) : 37498, 38104
.   : milestone, 37801,
appsec (49.874 ms) : 49422, 50325
.   : milestone, 49874,
code_origins (46.976 ms) : 46570, 47382
.   : milestone, 46976,
iast (46.616 ms) : 46211, 47021
.   : milestone, 46616,
profiling (48.289 ms) : 47836, 48743
.   : milestone, 48289,
tracing (42.717 ms) : 42361, 43074
.   : milestone, 42717,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.257 ms [36.954 ms, 37.56 ms] -
appsec 48.328 ms [47.897 ms, 48.759 ms] 11.071 ms (29.7%)
code_origins 45.335 ms [44.947 ms, 45.724 ms] 8.078 ms (21.7%)
iast 44.427 ms [44.035 ms, 44.819 ms] 7.17 ms (19.2%)
profiling 49.951 ms [49.497 ms, 50.405 ms] 12.694 ms (34.1%)
tracing 43.713 ms [43.347 ms, 44.079 ms] 6.456 ms (17.3%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.801 ms [37.498 ms, 38.104 ms] -
appsec 49.874 ms [49.422 ms, 50.325 ms] 12.073 ms (31.9%)
code_origins 46.976 ms [46.57 ms, 47.382 ms] 9.175 ms (24.3%)
iast 46.616 ms [46.211 ms, 47.021 ms] 8.815 ms (23.3%)
profiling 48.289 ms [47.836 ms, 48.743 ms] 10.488 ms (27.7%)
tracing 42.717 ms [42.361 ms, 43.074 ms] 4.916 ms (13.0%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.421 ms) : 4372, 4470
.   : milestone, 4421,
iast (9.533 ms) : 9373, 9692
.   : milestone, 9533,
iast_FULL (14.086 ms) : 13809, 14364
.   : milestone, 14086,
iast_GLOBAL (10.873 ms) : 10681, 11066
.   : milestone, 10873,
profiling (9.087 ms) : 8945, 9230
.   : milestone, 9087,
tracing (7.698 ms) : 7586, 7809
.   : milestone, 7698,
section candidate
no_agent (4.426 ms) : 4370, 4482
.   : milestone, 4426,
iast (9.311 ms) : 9158, 9465
.   : milestone, 9311,
iast_FULL (14.053 ms) : 13770, 14337
.   : milestone, 14053,
iast_GLOBAL (10.191 ms) : 9995, 10388
.   : milestone, 10191,
profiling (8.772 ms) : 8632, 8913
.   : milestone, 8772,
tracing (7.623 ms) : 7506, 7739
.   : milestone, 7623,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.421 ms [4.372 ms, 4.47 ms] -
iast 9.533 ms [9.373 ms, 9.692 ms] 5.112 ms (115.6%)
iast_FULL 14.086 ms [13.809 ms, 14.364 ms] 9.665 ms (218.6%)
iast_GLOBAL 10.873 ms [10.681 ms, 11.066 ms] 6.452 ms (146.0%)
profiling 9.087 ms [8.945 ms, 9.23 ms] 4.667 ms (105.6%)
tracing 7.698 ms [7.586 ms, 7.809 ms] 3.277 ms (74.1%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.426 ms [4.37 ms, 4.482 ms] -
iast 9.311 ms [9.158 ms, 9.465 ms] 4.885 ms (110.4%)
iast_FULL 14.053 ms [13.77 ms, 14.337 ms] 9.627 ms (217.5%)
iast_GLOBAL 10.191 ms [9.995 ms, 10.388 ms] 5.765 ms (130.3%)
profiling 8.772 ms [8.632 ms, 8.913 ms] 4.346 ms (98.2%)
tracing 7.623 ms [7.506 ms, 7.739 ms] 3.197 ms (72.2%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1757338973 1757404047
git_commit_sha 03d997e ea090f9
release_version 1.54.0-SNAPSHOT~03d997e2fd 1.51.0-SNAPSHOT~ea090f98cf
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1757406152 1757406152
ci_job_id 1119890039 1119890039
ci_pipeline_id 75918137 75918137
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-0ni5qxgy 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-0ni5qxgy 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.491 ms) : 1479, 1502
.   : milestone, 1491,
appsec (3.671 ms) : 3455, 3887
.   : milestone, 3671,
iast (2.211 ms) : 2148, 2274
.   : milestone, 2211,
iast_GLOBAL (2.255 ms) : 2192, 2318
.   : milestone, 2255,
profiling (2.055 ms) : 2004, 2105
.   : milestone, 2055,
tracing (2.017 ms) : 1969, 2066
.   : milestone, 2017,
section candidate
no_agent (1.49 ms) : 1478, 1502
.   : milestone, 1490,
appsec (3.688 ms) : 3470, 3906
.   : milestone, 3688,
iast (2.207 ms) : 2145, 2270
.   : milestone, 2207,
iast_GLOBAL (2.252 ms) : 2189, 2315
.   : milestone, 2252,
profiling (2.062 ms) : 2011, 2112
.   : milestone, 2062,
tracing (2.036 ms) : 1987, 2084
.   : milestone, 2036,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.491 ms [1.479 ms, 1.502 ms] -
appsec 3.671 ms [3.455 ms, 3.887 ms] 2.18 ms (146.3%)
iast 2.211 ms [2.148 ms, 2.274 ms] 720.485 µs (48.3%)
iast_GLOBAL 2.255 ms [2.192 ms, 2.318 ms] 764.842 µs (51.3%)
profiling 2.055 ms [2.004 ms, 2.105 ms] 564.289 µs (37.9%)
tracing 2.017 ms [1.969 ms, 2.066 ms] 526.712 µs (35.3%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.49 ms [1.478 ms, 1.502 ms] -
appsec 3.688 ms [3.47 ms, 3.906 ms] 2.198 ms (147.5%)
iast 2.207 ms [2.145 ms, 2.27 ms] 717.166 µs (48.1%)
iast_GLOBAL 2.252 ms [2.189 ms, 2.315 ms] 762.031 µs (51.1%)
profiling 2.062 ms [2.011 ms, 2.112 ms] 571.55 µs (38.4%)
tracing 2.036 ms [1.987 ms, 2.084 ms] 545.475 µs (36.6%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.302 s) : 15302000, 15302000
.   : milestone, 15302000,
appsec (14.905 s) : 14905000, 14905000
.   : milestone, 14905000,
iast (18.757 s) : 18757000, 18757000
.   : milestone, 18757000,
iast_GLOBAL (17.943 s) : 17943000, 17943000
.   : milestone, 17943000,
profiling (15.34 s) : 15340000, 15340000
.   : milestone, 15340000,
tracing (14.865 s) : 14865000, 14865000
.   : milestone, 14865000,
section candidate
no_agent (15.611 s) : 15611000, 15611000
.   : milestone, 15611000,
appsec (15.039 s) : 15039000, 15039000
.   : milestone, 15039000,
iast (19.1 s) : 19100000, 19100000
.   : milestone, 19100000,
iast_GLOBAL (18.275 s) : 18275000, 18275000
.   : milestone, 18275000,
profiling (15.768 s) : 15768000, 15768000
.   : milestone, 15768000,
tracing (14.993 s) : 14993000, 14993000
.   : milestone, 14993000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.302 s [15.302 s, 15.302 s] -
appsec 14.905 s [14.905 s, 14.905 s] -397.0 ms (-2.6%)
iast 18.757 s [18.757 s, 18.757 s] 3.455 s (22.6%)
iast_GLOBAL 17.943 s [17.943 s, 17.943 s] 2.641 s (17.3%)
profiling 15.34 s [15.34 s, 15.34 s] 38.0 ms (0.2%)
tracing 14.865 s [14.865 s, 14.865 s] -437.0 ms (-2.9%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.611 s [15.611 s, 15.611 s] -
appsec 15.039 s [15.039 s, 15.039 s] -572.0 ms (-3.7%)
iast 19.1 s [19.1 s, 19.1 s] 3.489 s (22.3%)
iast_GLOBAL 18.275 s [18.275 s, 18.275 s] 2.664 s (17.1%)
profiling 15.768 s [15.768 s, 15.768 s] 157.0 ms (1.0%)
tracing 14.993 s [14.993 s, 14.993 s] -618.0 ms (-4.0%)

@datadog-official
Copy link

datadog-official bot commented Aug 11, 2025

🎯 Code Coverage
Patch Coverage: 61.25%
Total Coverage: 57.84% (+0.01%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ea090f9 | Docs | Was this helpful? Give us feedback!

@daniel-mohedano daniel-mohedano changed the title Failed Test Replay Implement Failed Test Replay Aug 12, 2025
@daniel-mohedano daniel-mohedano removed the tag: do not merge Do not merge changes label Aug 13, 2025
Comment on lines 22 to 25
if (UPDATER.get() != null) {
LOGGER.debug("DebuggerConfigUpdater available, performing update");
UPDATER.get().updateConfig(update);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATER can be updated in-between, you should get the update in a local variable and call updateConfig(update) from this local variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in d80e531

return update != null ? update : existing;
}

public static final class Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this builder sounds overkill to me, I think we can manage to call the DebuggerConfigUpdate constructor with 4 parameters (null or boolean)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, in hindsight it is too much logic for its use, addressed in d80e531

return;
}
if (UPDATER.get() != null) {
LOGGER.debug("DebuggerConfigUpdater available, performing update");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log the content of update to know what is passed to the update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in d80e531

Comment on lines 35 to 42
public boolean equals(Object o) {
if (!(o instanceof DebuggerConfigUpdate)) return false;
DebuggerConfigUpdate that = (DebuggerConfigUpdate) o;
return Objects.equals(dynamicInstrumentationEnabled, that.dynamicInstrumentationEnabled)
&& Objects.equals(exceptionReplayEnabled, that.exceptionReplayEnabled)
&& Objects.equals(codeOriginEnabled, that.codeOriginEnabled)
&& Objects.equals(distributedDebuggerEnabled, that.distributedDebuggerEnabled);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used somewhere? otherwise we can remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in d80e531

Comment on lines 59 to 65
public int hashCode() {
return Objects.hash(
dynamicInstrumentationEnabled,
exceptionReplayEnabled,
codeOriginEnabled,
distributedDebuggerEnabled);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used somewhere? otherwise we can remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in d80e531

Comment on lines 90 to 105
if (failedTestReplayMode) {
TestContext testContext = InstrumentationTestBridge.getCurrentTestContext();
if (testContext == null) {
return;
}
TestExecutionHistory executionHistory = testContext.get(TestExecutionHistory.class);
if (executionHistory == null || !executionHistory.failedTestReplayApplicable()) {
return;
}
} else {
if (t instanceof Error) {
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Skip handling error: {}", t.toString());
}
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider to implement a extended version of DefaultExceptionDebugger special for FTR/CIViz?

we can refactor the class to be more extendable and have some specialized methods for the process of FTR?
this sounds better than tangled the dependency to FTR inside this class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also avoid to have a internal config param for async config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a good point, addressed in 34cf624 and separated the logic between the default implementation and the FTR specific implementation, should also be easier to maintain and makes the implementation easier to follow 👍

}

public void stop() {
lowRateFlush(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, add
snapshotSink.highRateFlush(null);
also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in d80e531

private final String service;
private final DebuggerIntakeRequestData debugger;
private final String ddsource = "dd_debugger";
private final String product;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I have more information about this?
what is the purpose?
is it standard for all tracers, coming from EVP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that snapshots are sent through logs to the backend, in the original Failed Test Replay RFC the team decided that it would be good to have a way of identifying these specific logs (only sent by Failed Test Replay) to avoid billing customers in the future when using the feature

Copy link
Contributor Author

@daniel-mohedano daniel-mohedano Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is only done in JS, but it is also currently the only implementation of Failed Test Replay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's remove this for now, we probably a mechanism to differentiate snapshot origin soon for own needs so you can reuse it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in f1f1264 👍

private String loggerThreadName;

public IntakeRequest(String service, DebuggerIntakeRequestData debugger) {
public IntakeRequest(String service, DebuggerIntakeRequestData debugger, Config config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure to want pass the config object just for the product attribute...

maxCapturedFrames);
config.getDebuggerMaxExceptionPerSecond(),
config.getDebuggerExceptionMaxCapturedFrames(),
config.isDebuggerExceptionAsyncConfig());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to keep the config param from Config, we can keep it local for the needs of FTR

Copy link
Contributor Author

@daniel-mohedano daniel-mohedano Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in f1f1264 to always use the async instrumentation

# Conflicts:
#	dd-java-agent/agent-ci-visibility/src/main/java/datadog/trace/civisibility/events/TestEventsHandlerImpl.java
#	dd-java-agent/agent-ci-visibility/src/main/java/datadog/trace/civisibility/execution/RunNTimes.java
#	dd-java-agent/agent-ci-visibility/src/main/java/datadog/trace/civisibility/execution/RunOnceIgnoreOutcome.java
#	internal-api/src/main/java/datadog/trace/api/civisibility/execution/TestExecutionHistory.java
Copy link
Contributor

@nikita-tkachenko-datadog nikita-tkachenko-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, can't wait to see this dogfooded!

}

maybeStartAppSec(scoClass, sco);
// start civisibility before debugger to enable Failed Test Replay correctly in headless mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we can remove this comment now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long-awaited smoke test for the headless mode, nice!

return JAVA_HOME + separator + "bin" + separator + "javac"
}

String javaToolOptions(List<String> additionalAgentArgs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these can be extracted to CiVisibilitySmokeTest, they seem to look awfully similar for the 3 smoke tests we have


public static void setUpdater(@Nonnull DebuggerConfigUpdater updater) {
DebuggerConfigUpdater oldUpdater = UPDATER.getAndSet(updater);
if (oldUpdater == null) {
Copy link
Contributor

@nikita-tkachenko-datadog nikita-tkachenko-datadog Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's mark these two methods as syncrhonized and make the UPDATER field a regular volatile reference (to ensure visibility in the is... methods below).

Given that the methods are called rarely enough (set updater once at the start, update config once at the start and then whenever the service settings are updated ), there shouldn't be contention for the lock and it'll be much easier to reason about the methods' correctness.

Current implementation permits executing updates out of order:

  1. Update arrives and is set as deferred
  2. Updater is set, thread is preempted before we process the deferred update
  3. Another update arrives in a different thread, sees the updater and gets executed
  4. The thread setting updater resumes executing and applies the deferred update out of order

Comment on lines 31 to 34
DebuggerConfigUpdate toApply = DEFERRED_UPDATE;
DEFERRED_UPDATE = null;
LOGGER.debug("Processing deferred update {}", toApply);
updater.updateConfig(toApply);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: no need to do the local var dance anymore, now that the field is guarded by the lock:

Suggested change
DebuggerConfigUpdate toApply = DEFERRED_UPDATE;
DEFERRED_UPDATE = null;
LOGGER.debug("Processing deferred update {}", toApply);
updater.updateConfig(toApply);
LOGGER.debug("Processing deferred update {}", toApply);
updater.updateConfig(DEFERRED_UPDATE);
DEFERRED_UPDATE = null;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside the method yes, but unfortunately for other methods it's not true.

Copy link
Member

@jpbempel jpbempel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing with synchronized and volatile need to be done correctly 😄
or you go full synchronized

Comment on lines 44 to 47
public static boolean isDynamicInstrumentationEnabled() {
if (UPDATER != null) {
return UPDATER.isDynamicInstrumentationEnabled();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need local variable, UPDATER field can be updated in the middle

Comment on lines 52 to 54
if (UPDATER != null) {
return UPDATER.isExceptionReplayEnabled();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need local variable, UPDATER field can be updated in the middle

Comment on lines 59 to 61
if (UPDATER != null) {
return UPDATER.isCodeOriginEnabled();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need local variable, UPDATER field can be updated in the middle

Comment on lines 66 to 68
if (UPDATER != null) {
return UPDATER.isDistributedDebuggerEnabled();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need local variable, UPDATER field can be updated in the middle

// for testing purposes
static void reset() {
UPDATER = null;
DEFERRED_UPDATE = null;
Copy link
Member

@jpbempel jpbempel Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even this can be problematic for test. the null assigned to DEFERRED_UPDATE can be non-visible for other threads

@nikita-tkachenko-datadog
Copy link
Contributor

Playing with synchronized and volatile need to be done correctly 😄 or you go full synchronized

True, my assumption was that UPDATER was never going to be set to null, but I overlooked the reset method. Synchronising reset and saving the result of the first read into a local var in the is... methods should do it

@daniel-mohedano
Copy link
Contributor Author

fixed in ea090f9, thank you both for the suggestions 👍!

Copy link
Member

@jpbempel jpbempel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from debugger perspective

@daniel-mohedano daniel-mohedano merged commit cb81aca into master Sep 9, 2025
510 checks passed
@daniel-mohedano daniel-mohedano deleted the daniel.mohedano/failed-test-replay branch September 9, 2025 09:02
@github-actions github-actions bot added this to the 1.54.0 milestone Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants