- 
                Notifications
    You must be signed in to change notification settings 
- Fork 315
Implement Failed Test Replay #9214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Debugger benchmarksParameters
 See matching parameters
 SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics. See unchanged results
 Request duration reports for reportsgantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (321.776 µs) : 289, 354
.   : milestone, 322,
basic (284.368 µs) : 277, 291
.   : milestone, 284,
loop (8.977 ms) : 8972, 8981
.   : milestone, 8977,
section candidate
noprobe (342.978 µs) : 266, 420
.   : milestone, 343,
basic (282.789 µs) : 275, 290
.   : milestone, 283,
loop (8.968 ms) : 8962, 8974
.   : milestone, 8968,
 
 
 
 | 
| BenchmarksStartupParameters
 See matching parameters
 SummaryFound 1 performance improvements and 3 performance regressions! Performance is the same for 43 metrics, 12 unstable metrics. 
 Startup time reports for insecure-bankgantt
    title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1054580
Total [baseline] (8.64 s) : 0, 8640214
Agent [candidate] (1.049 s) : 0, 1048551
Total [candidate] (8.631 s) : 0, 8630751
section iast
Agent [baseline] (1.177 s) : 0, 1177049
Total [baseline] (9.349 s) : 0, 9349431
Agent [candidate] (1.182 s) : 0, 1181600
Total [candidate] (9.389 s) : 0, 9388608
 
 
 
 gantt
    title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.449 ms) : 0, 1449
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (737.465 ms) : 0, 737465
BytebuddyAgent [candidate] (733.08 ms) : 0, 733080
GlobalTracer [baseline] (243.601 ms) : 0, 243601
GlobalTracer [candidate] (242.713 ms) : 0, 242713
AppSec [baseline] (30.256 ms) : 0, 30256
AppSec [candidate] (30.067 ms) : 0, 30067
Debugger [baseline] (6.121 ms) : 0, 6121
Debugger [candidate] (6.405 ms) : 0, 6405
Remote Config [baseline] (689.796 µs) : 0, 690
Remote Config [candidate] (670.262 µs) : 0, 670
Telemetry [baseline] (13.835 ms) : 0, 13835
Telemetry [candidate] (13.062 ms) : 0, 13062
section iast
crashtracking [baseline] (1.462 ms) : 0, 1462
crashtracking [candidate] (1.47 ms) : 0, 1470
BytebuddyAgent [baseline] (850.215 ms) : 0, 850215
BytebuddyAgent [candidate] (852.491 ms) : 0, 852491
GlobalTracer [baseline] (232.417 ms) : 0, 232417
GlobalTracer [candidate] (233.45 ms) : 0, 233450
IAST [baseline] (30.831 ms) : 0, 30831
IAST [candidate] (29.43 ms) : 0, 29430
AppSec [baseline] (25.967 ms) : 0, 25967
AppSec [candidate] (27.712 ms) : 0, 27712
Debugger [baseline] (6.538 ms) : 0, 6538
Debugger [candidate] (6.991 ms) : 0, 6991
Remote Config [baseline] (595.36 µs) : 0, 595
Remote Config [candidate] (610.877 µs) : 0, 611
Telemetry [baseline] (8.088 ms) : 0, 8088
Telemetry [candidate] (8.35 ms) : 0, 8350
Startup time reports for petclinicgantt
    title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1050175
Total [baseline] (10.821 s) : 0, 10821338
Agent [candidate] (1.047 s) : 0, 1046833
Total [candidate] (10.703 s) : 0, 10703180
section appsec
Agent [baseline] (1.223 s) : 0, 1223314
Total [baseline] (10.845 s) : 0, 10845189
Agent [candidate] (1.224 s) : 0, 1224142
Total [candidate] (10.832 s) : 0, 10832013
section iast
Agent [baseline] (1.192 s) : 0, 1191539
Total [baseline] (10.937 s) : 0, 10937257
Agent [candidate] (1.184 s) : 0, 1184126
Total [candidate] (10.991 s) : 0, 10991147
section profiling
Agent [baseline] (1.21 s) : 0, 1209751
Total [baseline] (10.952 s) : 0, 10952397
Agent [candidate] (1.204 s) : 0, 1204315
Total [candidate] (11.042 s) : 0, 11041905
 
 
 
 gantt
    title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.456 ms) : 0, 1456
crashtracking [candidate] (1.444 ms) : 0, 1444
BytebuddyAgent [baseline] (733.365 ms) : 0, 733365
BytebuddyAgent [candidate] (731.88 ms) : 0, 731880
GlobalTracer [baseline] (243.485 ms) : 0, 243485
GlobalTracer [candidate] (242.252 ms) : 0, 242252
AppSec [baseline] (30.147 ms) : 0, 30147
AppSec [candidate] (30.088 ms) : 0, 30088
Debugger [baseline] (6.076 ms) : 0, 6076
Debugger [candidate] (6.428 ms) : 0, 6428
Remote Config [baseline] (697.636 µs) : 0, 698
Remote Config [candidate] (667.8 µs) : 0, 668
Telemetry [baseline] (13.791 ms) : 0, 13791
Telemetry [candidate] (13.037 ms) : 0, 13037
section appsec
crashtracking [baseline] (1.469 ms) : 0, 1469
crashtracking [candidate] (1.454 ms) : 0, 1454
BytebuddyAgent [baseline] (754.446 ms) : 0, 754446
BytebuddyAgent [candidate] (755.543 ms) : 0, 755543
GlobalTracer [baseline] (235.583 ms) : 0, 235583
GlobalTracer [candidate] (235.849 ms) : 0, 235849
IAST [baseline] (23.448 ms) : 0, 23448
IAST [candidate] (23.409 ms) : 0, 23409
AppSec [baseline] (166.065 ms) : 0, 166065
AppSec [candidate] (166.379 ms) : 0, 166379
Debugger [baseline] (11.235 ms) : 0, 11235
Debugger [candidate] (10.604 ms) : 0, 10604
Remote Config [baseline] (624.568 µs) : 0, 625
Remote Config [candidate] (618.183 µs) : 0, 618
Telemetry [baseline] (9.319 ms) : 0, 9319
Telemetry [candidate] (9.272 ms) : 0, 9272
section iast
crashtracking [baseline] (1.465 ms) : 0, 1465
crashtracking [candidate] (1.471 ms) : 0, 1471
BytebuddyAgent [baseline] (860.483 ms) : 0, 860483
BytebuddyAgent [candidate] (853.603 ms) : 0, 853603
GlobalTracer [baseline] (235.031 ms) : 0, 235031
GlobalTracer [candidate] (234.144 ms) : 0, 234144
IAST [baseline] (30.61 ms) : 0, 30610
IAST [candidate] (31.283 ms) : 0, 31283
AppSec [baseline] (26.331 ms) : 0, 26331
AppSec [candidate] (27.167 ms) : 0, 27167
Debugger [baseline] (7.479 ms) : 0, 7479
Debugger [candidate] (6.207 ms) : 0, 6207
Remote Config [baseline] (623.346 µs) : 0, 623
Remote Config [candidate] (622.076 µs) : 0, 622
Telemetry [baseline] (8.234 ms) : 0, 8234
Telemetry [candidate] (8.51 ms) : 0, 8510
section profiling
crashtracking [baseline] (1.442 ms) : 0, 1442
crashtracking [candidate] (1.436 ms) : 0, 1436
BytebuddyAgent [baseline] (769.649 ms) : 0, 769649
BytebuddyAgent [candidate] (764.25 ms) : 0, 764250
GlobalTracer [baseline] (224.462 ms) : 0, 224462
GlobalTracer [candidate] (224.552 ms) : 0, 224552
AppSec [baseline] (30.795 ms) : 0, 30795
AppSec [candidate] (30.685 ms) : 0, 30685
Debugger [baseline] (6.347 ms) : 0, 6347
Debugger [candidate] (6.703 ms) : 0, 6703
Remote Config [baseline] (729.498 µs) : 0, 729
Remote Config [candidate] (698.339 µs) : 0, 698
Telemetry [baseline] (16.709 ms) : 0, 16709
Telemetry [candidate] (16.262 ms) : 0, 16262
ProfilingAgent [baseline] (108.709 ms) : 0, 108709
ProfilingAgent [candidate] (108.991 ms) : 0, 108991
Profiling [baseline] (109.359 ms) : 0, 109359
Profiling [candidate] (109.707 ms) : 0, 109707
LoadParameters
 See matching parameters
 SummaryFound 2 performance improvements and 3 performance regressions! Performance is the same for 7 metrics, 12 unstable metrics. 
 Request duration reports for petclinicgantt
    title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.257 ms) : 36954, 37560
.   : milestone, 37257,
appsec (48.328 ms) : 47897, 48759
.   : milestone, 48328,
code_origins (45.335 ms) : 44947, 45724
.   : milestone, 45335,
iast (44.427 ms) : 44035, 44819
.   : milestone, 44427,
profiling (49.951 ms) : 49497, 50405
.   : milestone, 49951,
tracing (43.713 ms) : 43347, 44079
.   : milestone, 43713,
section candidate
no_agent (37.801 ms) : 37498, 38104
.   : milestone, 37801,
appsec (49.874 ms) : 49422, 50325
.   : milestone, 49874,
code_origins (46.976 ms) : 46570, 47382
.   : milestone, 46976,
iast (46.616 ms) : 46211, 47021
.   : milestone, 46616,
profiling (48.289 ms) : 47836, 48743
.   : milestone, 48289,
tracing (42.717 ms) : 42361, 43074
.   : milestone, 42717,
 
 
 
 Request duration reports for insecure-bankgantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.421 ms) : 4372, 4470
.   : milestone, 4421,
iast (9.533 ms) : 9373, 9692
.   : milestone, 9533,
iast_FULL (14.086 ms) : 13809, 14364
.   : milestone, 14086,
iast_GLOBAL (10.873 ms) : 10681, 11066
.   : milestone, 10873,
profiling (9.087 ms) : 8945, 9230
.   : milestone, 9087,
tracing (7.698 ms) : 7586, 7809
.   : milestone, 7698,
section candidate
no_agent (4.426 ms) : 4370, 4482
.   : milestone, 4426,
iast (9.311 ms) : 9158, 9465
.   : milestone, 9311,
iast_FULL (14.053 ms) : 13770, 14337
.   : milestone, 14053,
iast_GLOBAL (10.191 ms) : 9995, 10388
.   : milestone, 10191,
profiling (8.772 ms) : 8632, 8913
.   : milestone, 8772,
tracing (7.623 ms) : 7506, 7739
.   : milestone, 7623,
 
 
 
 DacapoParameters
 See matching parameters
 SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
    title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.491 ms) : 1479, 1502
.   : milestone, 1491,
appsec (3.671 ms) : 3455, 3887
.   : milestone, 3671,
iast (2.211 ms) : 2148, 2274
.   : milestone, 2211,
iast_GLOBAL (2.255 ms) : 2192, 2318
.   : milestone, 2255,
profiling (2.055 ms) : 2004, 2105
.   : milestone, 2055,
tracing (2.017 ms) : 1969, 2066
.   : milestone, 2017,
section candidate
no_agent (1.49 ms) : 1478, 1502
.   : milestone, 1490,
appsec (3.688 ms) : 3470, 3906
.   : milestone, 3688,
iast (2.207 ms) : 2145, 2270
.   : milestone, 2207,
iast_GLOBAL (2.252 ms) : 2189, 2315
.   : milestone, 2252,
profiling (2.062 ms) : 2011, 2112
.   : milestone, 2062,
tracing (2.036 ms) : 1987, 2084
.   : milestone, 2036,
 
 
 
 Execution time for biojavagantt
    title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~ea090f98cf, baseline=1.54.0-SNAPSHOT~03d997e2fd
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.302 s) : 15302000, 15302000
.   : milestone, 15302000,
appsec (14.905 s) : 14905000, 14905000
.   : milestone, 14905000,
iast (18.757 s) : 18757000, 18757000
.   : milestone, 18757000,
iast_GLOBAL (17.943 s) : 17943000, 17943000
.   : milestone, 17943000,
profiling (15.34 s) : 15340000, 15340000
.   : milestone, 15340000,
tracing (14.865 s) : 14865000, 14865000
.   : milestone, 14865000,
section candidate
no_agent (15.611 s) : 15611000, 15611000
.   : milestone, 15611000,
appsec (15.039 s) : 15039000, 15039000
.   : milestone, 15039000,
iast (19.1 s) : 19100000, 19100000
.   : milestone, 19100000,
iast_GLOBAL (18.275 s) : 18275000, 18275000
.   : milestone, 18275000,
profiling (15.768 s) : 15768000, 15768000
.   : milestone, 15768000,
tracing (14.993 s) : 14993000, 14993000
.   : milestone, 14993000,
 
 
 
 | 
| 🎯 Code Coverage 🔗 Commit SHA: ea090f9 | Docs | Was this helpful? Give us feedback! | 
| if (UPDATER.get() != null) { | ||
| LOGGER.debug("DebuggerConfigUpdater available, performing update"); | ||
| UPDATER.get().updateConfig(update); | ||
| } else { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATER can be updated in-between, you should get the update in a local variable and call updateConfig(update) from this local variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in d80e531
| return update != null ? update : existing; | ||
| } | ||
|  | ||
| public static final class Builder { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this builder sounds overkill to me, I think we can manage to call the DebuggerConfigUpdate constructor with 4 parameters (null or boolean)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, in hindsight it is too much logic for its use, addressed in d80e531
| return; | ||
| } | ||
| if (UPDATER.get() != null) { | ||
| LOGGER.debug("DebuggerConfigUpdater available, performing update"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we log the content of update to know what is passed to the update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in d80e531
| public boolean equals(Object o) { | ||
| if (!(o instanceof DebuggerConfigUpdate)) return false; | ||
| DebuggerConfigUpdate that = (DebuggerConfigUpdate) o; | ||
| return Objects.equals(dynamicInstrumentationEnabled, that.dynamicInstrumentationEnabled) | ||
| && Objects.equals(exceptionReplayEnabled, that.exceptionReplayEnabled) | ||
| && Objects.equals(codeOriginEnabled, that.codeOriginEnabled) | ||
| && Objects.equals(distributedDebuggerEnabled, that.distributedDebuggerEnabled); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used somewhere? otherwise we can remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in d80e531
| public int hashCode() { | ||
| return Objects.hash( | ||
| dynamicInstrumentationEnabled, | ||
| exceptionReplayEnabled, | ||
| codeOriginEnabled, | ||
| distributedDebuggerEnabled); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used somewhere? otherwise we can remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in d80e531
| if (failedTestReplayMode) { | ||
| TestContext testContext = InstrumentationTestBridge.getCurrentTestContext(); | ||
| if (testContext == null) { | ||
| return; | ||
| } | ||
| TestExecutionHistory executionHistory = testContext.get(TestExecutionHistory.class); | ||
| if (executionHistory == null || !executionHistory.failedTestReplayApplicable()) { | ||
| return; | ||
| } | ||
| } else { | ||
| if (t instanceof Error) { | ||
| if (LOGGER.isDebugEnabled()) { | ||
| LOGGER.debug("Skip handling error: {}", t.toString()); | ||
| } | ||
| return; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consider to implement a extended version of DefaultExceptionDebugger special for FTR/CIViz?
we can refactor the class to be more extendable and have some specialized methods for the process of FTR?
this sounds better than tangled the dependency to FTR inside this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also avoid to have a internal config param for async config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a good point, addressed in 34cf624 and separated the logic between the default implementation and the FTR specific implementation, should also be easier to maintain and makes the implementation easier to follow 👍
| } | ||
|  | ||
| public void stop() { | ||
| lowRateFlush(this); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, add
snapshotSink.highRateFlush(null);
also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in d80e531
| private final String service; | ||
| private final DebuggerIntakeRequestData debugger; | ||
| private final String ddsource = "dd_debugger"; | ||
| private final String product; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I have more information about this?
what is the purpose?
is it standard for all tracers, coming from EVP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that snapshots are sent through logs to the backend, in the original Failed Test Replay RFC the team decided that it would be good to have a way of identifying these specific logs (only sent by Failed Test Replay) to avoid billing customers in the future when using the feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this is only done in JS, but it is also currently the only implementation of Failed Test Replay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's remove this for now, we probably a mechanism to differentiate snapshot origin soon for own needs so you can reuse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in f1f1264 👍
| private String loggerThreadName; | ||
|  | ||
| public IntakeRequest(String service, DebuggerIntakeRequestData debugger) { | ||
| public IntakeRequest(String service, DebuggerIntakeRequestData debugger, Config config) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure to want pass the config object just for the product attribute...
| maxCapturedFrames); | ||
| config.getDebuggerMaxExceptionPerSecond(), | ||
| config.getDebuggerExceptionMaxCapturedFrames(), | ||
| config.isDebuggerExceptionAsyncConfig()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't want to keep the config param from Config, we can keep it local for the needs of FTR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in f1f1264 to always use the async instrumentation
# Conflicts: # dd-java-agent/agent-ci-visibility/src/main/java/datadog/trace/civisibility/events/TestEventsHandlerImpl.java # dd-java-agent/agent-ci-visibility/src/main/java/datadog/trace/civisibility/execution/RunNTimes.java # dd-java-agent/agent-ci-visibility/src/main/java/datadog/trace/civisibility/execution/RunOnceIgnoreOutcome.java # internal-api/src/main/java/datadog/trace/api/civisibility/execution/TestExecutionHistory.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, can't wait to see this dogfooded!
| } | ||
|  | ||
| maybeStartAppSec(scoClass, sco); | ||
| // start civisibility before debugger to enable Failed Test Replay correctly in headless mode | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we can remove this comment now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long-awaited smoke test for the headless mode, nice!
| return JAVA_HOME + separator + "bin" + separator + "javac" | ||
| } | ||
|  | ||
| String javaToolOptions(List<String> additionalAgentArgs) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these can be extracted to CiVisibilitySmokeTest, they seem to look awfully similar for the 3 smoke tests we have
|  | ||
| public static void setUpdater(@Nonnull DebuggerConfigUpdater updater) { | ||
| DebuggerConfigUpdater oldUpdater = UPDATER.getAndSet(updater); | ||
| if (oldUpdater == null) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's mark these two methods as syncrhonized and make the UPDATER field a regular volatile reference (to ensure visibility in the is... methods below).
Given that the methods are called rarely enough (set updater once at the start, update config once at the start and then whenever the service settings are updated ), there shouldn't be contention for the lock and it'll be much easier to reason about the methods' correctness.
Current implementation permits executing updates out of order:
- Update arrives and is set as deferred
- Updater is set, thread is preempted before we process the deferred update
- Another update arrives in a different thread, sees the updater and gets executed
- The thread setting updater resumes executing and applies the deferred update out of order
| DebuggerConfigUpdate toApply = DEFERRED_UPDATE; | ||
| DEFERRED_UPDATE = null; | ||
| LOGGER.debug("Processing deferred update {}", toApply); | ||
| updater.updateConfig(toApply); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: no need to do the local var dance anymore, now that the field is guarded by the lock:
| DebuggerConfigUpdate toApply = DEFERRED_UPDATE; | |
| DEFERRED_UPDATE = null; | |
| LOGGER.debug("Processing deferred update {}", toApply); | |
| updater.updateConfig(toApply); | |
| LOGGER.debug("Processing deferred update {}", toApply); | |
| updater.updateConfig(DEFERRED_UPDATE); | |
| DEFERRED_UPDATE = null; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside the method yes, but unfortunately for other methods it's not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing with synchronized and volatile need to be done correctly 😄
or you go full synchronized
| public static boolean isDynamicInstrumentationEnabled() { | ||
| if (UPDATER != null) { | ||
| return UPDATER.isDynamicInstrumentationEnabled(); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need local variable, UPDATER field can be updated in the middle
| if (UPDATER != null) { | ||
| return UPDATER.isExceptionReplayEnabled(); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need local variable, UPDATER field can be updated in the middle
| if (UPDATER != null) { | ||
| return UPDATER.isCodeOriginEnabled(); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need local variable, UPDATER field can be updated in the middle
| if (UPDATER != null) { | ||
| return UPDATER.isDistributedDebuggerEnabled(); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need local variable, UPDATER field can be updated in the middle
| // for testing purposes | ||
| static void reset() { | ||
| UPDATER = null; | ||
| DEFERRED_UPDATE = null; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even this can be problematic for test. the null assigned to DEFERRED_UPDATE can be non-visible for other threads
| 
 True, my assumption was that  | 
| fixed in ea090f9, thank you both for the suggestions 👍! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from debugger perspective
What Does This Do
Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:
DebuggerConfigBridge. It handles deferred updates, so no order dependency on startup is introduced between the products that want to use Live Debugger's features.FailedTestReplayExceptionDebuggerwas created to support the feature. It will:Errors, which were previously ignored.productfield to snapshots, populated withtest_optimizationif Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.DD_CIVISIBILITY_AGENTLESS_ENABLEDis set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.DebuggerSinknow flushes on closing to avoid snapshots not being sent on test session finish.Additional changes:
BackendApiFactory.Intaketo a standaloneIntake, given that it is useful in order to compute agentless mode URLs.failed_test_replayin test frameworks that support Auto Test Retries.di_enabledto the Settings response and telemetry.Validation:
MavenSmokeTestnow has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.JUnitConsoleSmokeTestto validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.Motivation
Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.
Contributor Checklist
type:and (comp:orinst:) labels in addition to any usefull labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: SDTEST-2242