Skip to content

Conversation

alopezz
Copy link
Contributor

@alopezz alopezz commented Aug 25, 2025

What does this PR do?

It makes it possible to build custom ddot (byoc) targeting a broader compatibility base (defined by available version of glibc on the OS), by allowing the choice of the base version of Ubuntu used, via --build-arg UBUNTU_VERSION=xx.xx.

Motivation

Allow more users to benefit from the ddot byoc workflow.

Describe how you validated your changes

Added a job to the pipeline which runs the build on Ubuntu 20.04 and checks that the maximum glibc version required by the binary doesn't exceed what would be expected for that version of Ubuntu.

Possible Drawbacks / Trade-offs

  • Not very user-friendly, but provides a quick way to provide this compatibility.

Additional Notes

This pins the dda version in use and switches to its "standalone" version such that we don't need to worry about the python version or python env management.

In practice, by dropping to Ubuntu 20.04, we currently already see the glibc requirement drop to 2.17 max, which makes it at least as compatible as the regular Agent.

We eventually plan to package our C/C++ toolchains that target the compatibility that the Agent supports, which would probably give us a better way to provide the necessary compatibility without the need to change the base image like done here.

@github-actions github-actions bot added the short review PR is simple enough to be reviewed quickly label Aug 25, 2025
@alopezz alopezz added changelog/no-changelog qa/no-code-change No code change in Agent code requiring validation labels Aug 25, 2025
@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Aug 25, 2025

Gitlab CI Configuration Changes

Added Jobs

ddot_byoc_binary_build_test_ubuntu2004
ddot_byoc_binary_build_test_ubuntu2004:
  before_script:
  - mkdir -p /tmp/otel-ci
  - cp comp/otelcol/collector-contrib/impl/manifest.yaml /tmp/otel-ci/
  - cp Dockerfiles/agent-ddot/Dockerfile.agent-otel /tmp/otel-ci/
  - cp test/integration/docker/otel_agent_build_tests.py /tmp/otel-ci/
  - wget https://github.com/mikefarah/yq/releases/download/3.4.1/yq_linux_amd64 -O
    /usr/bin/yq && chmod +x /usr/bin/yq
  - export OTELCOL_VERSION=v$(/usr/bin/yq r /tmp/otel-ci/manifest.yaml dist.version)
  - yq w -i /tmp/otel-ci/manifest.yaml "receivers[+] gomod" "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sobjectsreceiver
    ${OTELCOL_VERSION}"
  - yq w -i /tmp/otel-ci/manifest.yaml "processors[+] gomod" "github.com/open-telemetry/opentelemetry-collector-contrib/processor/metricstransformprocessor
    ${OTELCOL_VERSION}"
  image: registry.ddbuild.io/ci/datadog-agent-buildimages/docker_x64$CI_IMAGE_DOCKER_X64_SUFFIX:$CI_IMAGE_DOCKER_X64
  needs:
  - integration_tests_otel
  rules:
  - if: $CI_COMMIT_BRANCH =~ /^mq-working-branch-/
    when: never
  - when: on_success
  script:
  - DOCKER_LOGIN=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $DOCKER_REGISTRY_RO user)
    || exit $?
  - $CI_PROJECT_DIR/tools/ci/fetch_secret.sh $DOCKER_REGISTRY_RO token | docker login
    --username "$DOCKER_LOGIN" --password-stdin "$DOCKER_REGISTRY_URL"
  - EXIT="${PIPESTATUS[0]}"; if [ $EXIT -ne 0 ]; then echo "Unable to locate credentials
    needs gitlab runner restart"; exit $EXIT; fi
  - AGENT_VERSION=$(dda inv -- agent.version --no-include-git --no-include-pre)
  - "docker build \\\n  --target artifact \\\n  --output type=local,dest=./ \\\n \
    \ --build-arg AGENT_BRANCH=$CI_COMMIT_REF_NAME \\\n  --build-arg AGENT_VERSION=$AGENT_VERSION\
    \ \\\n  --build-arg UBUNTU_VERSION=20.04 \\\n  -f /tmp/otel-ci/Dockerfile.agent-otel\
    \ /tmp/otel-ci\n"
  - "BIN_PATH=ddot-byoc/otel-agent\nif [ ! -f \"$BIN_PATH\" ]; then\n  echo \"ERROR:\
    \ Expected otel-agent binary not found in output directory!\" >&2\n  exit 1\n\
    fi\n"
  - "ALLOWED_GLIBC=2.31\nREQUIRED_GLIBC=$(objdump -T \"$BIN_PATH\" 2>/dev/null | grep\
    \ -o 'GLIBC_[0-9][0-9.]*' | sed 's/[^0-9.]//g' | sort -V | tail -1)\necho \"Detected\
    \ required GLIBC version: ${REQUIRED_GLIBC:-unknown} (allowed max: $ALLOWED_GLIBC)\"\
    \nif [ -z \"$REQUIRED_GLIBC\" ]; then\n  echo \"WARNING: Could not detect GLIBC\
    \ requirement from binary; proceeding without hard failure.\" >&2\nelse\n  # This\
    \ amounts to reporting an error when $REQUIRED_GLIBC > $ALLOWED_GLIBC\n  highest_version=$(printf\
    \ '%s\\n%s\\n' \"$REQUIRED_GLIBC\" \"$ALLOWED_GLIBC\" | sort -V | tail -1)\n \
    \ if [[ \"$REQUIRED_GLIBC\" != \"$ALLOWED_GLIBC\" && \"$highest_version\" == \"\
    $REQUIRED_GLIBC\" ]]; then\n    echo \"ERROR: otel-agent requires GLIBC_$REQUIRED_GLIBC\
    \ which exceeds expected GLIBC_$ALLOWED_GLIBC\" >&2\n    objdump -T \"$BIN_PATH\"\
    \ | grep 'GLIBC'\n    exit 1\n  fi\nfi\n"
  stage: integration_test
  tags:
  - docker-in-docker:amd64

Changes Summary

Removed Modified Added Renamed
0 0 1 0

ℹ️ Diff available in the job log.

Copy link

cit-pr-commenter bot commented Aug 25, 2025

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: ed4ed290-9567-448e-9224-3447a235b1ea

Baseline: d004c1d
Comparison: dcb55f0
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -3.12 [-6.20, -0.05] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
quality_gate_metrics_logs memory utilization +2.50 [+2.16, +2.84] 1 Logs bounds checks dashboard
otlp_ingest_metrics memory utilization +0.84 [+0.69, +0.98] 1 Logs
quality_gate_logs % cpu utilization +0.82 [-1.96, +3.60] 1 Logs bounds checks dashboard
otlp_ingest_logs memory utilization +0.72 [+0.56, +0.87] 1 Logs
ddot_metrics memory utilization +0.57 [+0.36, +0.78] 1 Logs
file_to_blackhole_100ms_latency egress throughput +0.06 [-0.57, +0.69] 1 Logs
file_tree memory utilization +0.05 [+0.02, +0.09] 1 Logs
file_to_blackhole_0ms_latency egress throughput +0.04 [-0.55, +0.62] 1 Logs
file_to_blackhole_1000ms_latency egress throughput +0.01 [-0.57, +0.60] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.01 [-0.04, +0.06] 1 Logs
uds_dogstatsd_to_api ingress throughput +0.01 [-0.06, +0.07] 1 Logs
file_to_blackhole_500ms_latency egress throughput -0.01 [-0.61, +0.58] 1 Logs
quality_gate_idle memory utilization -0.11 [-0.14, -0.07] 1 Logs bounds checks dashboard
quality_gate_idle_all_features memory utilization -0.24 [-0.27, -0.20] 1 Logs bounds checks dashboard
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization -0.28 [-0.31, -0.24] 1 Logs
ddot_logs memory utilization -0.46 [-0.57, -0.35] 1 Logs
tcp_syslog_to_blackhole ingress throughput -1.38 [-1.45, -1.31] 1 Logs
docker_containers_memory memory utilization -2.52 [-2.68, -2.36] 1 Logs
docker_containers_cpu % cpu utilization -3.12 [-6.20, -0.05] 1 Logs

Bounds Checks: ❌ Failed

perf experiment bounds_check_name replicates_passed links
docker_containers_cpu simple_check_run 9/10
docker_containers_memory memory_usage 10/10
docker_containers_memory simple_check_run 8/10
file_to_blackhole_0ms_latency lost_bytes 10/10
file_to_blackhole_0ms_latency memory_usage 10/10
file_to_blackhole_1000ms_latency memory_usage 10/10
file_to_blackhole_100ms_latency lost_bytes 10/10
file_to_blackhole_100ms_latency memory_usage 10/10
file_to_blackhole_500ms_latency lost_bytes 10/10
file_to_blackhole_500ms_latency memory_usage 10/10
quality_gate_idle intake_connections 10/10 bounds checks dashboard
quality_gate_idle memory_usage 10/10 bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 bounds checks dashboard
quality_gate_logs intake_connections 10/10 bounds checks dashboard
quality_gate_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_logs memory_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 bounds checks dashboard
quality_gate_metrics_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.

@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Aug 25, 2025

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor b4df3a3

Successful checks

Info

Quality gate Delta On disk size (MiB) Delta On wire size (MiB)
agent_deb_amd64 DataNotFound $${700.38}$$ < $${704.84}$$ DataNotFound $${177.18}$$ < $${178.58}$$
agent_deb_amd64_fips DataNotFound $${687.67}$$ < $${703.09}$$ DataNotFound $${174.04}$$ < $${178.12}$$
agent_heroku_amd64 DataNotFound $${350.32}$$ < $${355.37}$$ DataNotFound $${93.97}$$ < $${95.72}$$
agent_msi DataNotFound $${975.56}$$ < $${980.61}$$ DataNotFound $${150.19}$$ < $${152.67}$$
agent_rpm_amd64 DataNotFound $${700.37}$$ < $${704.83}$$ DataNotFound $${179.06}$$ < $${180.22}$$
agent_rpm_amd64_fips DataNotFound $${687.66}$$ < $${703.08}$$ DataNotFound $${175.73}$$ < $${179.85}$$
agent_rpm_arm64 DataNotFound $${689.41}$$ < $${694.74}$$ DataNotFound $${161.64}$$ < $${163.96}$$
agent_rpm_arm64_fips DataNotFound $${677.51}$$ < $${693.05}$$ DataNotFound $${158.43}$$ < $${163.0}$$
agent_suse_amd64 DataNotFound $${700.37}$$ < $${704.83}$$ DataNotFound $${179.06}$$ < $${180.22}$$
agent_suse_amd64_fips DataNotFound $${687.66}$$ < $${703.08}$$ DataNotFound $${175.73}$$ < $${179.85}$$
agent_suse_arm64 DataNotFound $${689.41}$$ < $${694.74}$$ DataNotFound $${161.64}$$ < $${163.96}$$
agent_suse_arm64_fips DataNotFound $${677.51}$$ < $${693.05}$$ DataNotFound $${158.43}$$ < $${163.0}$$
docker_agent_amd64 DataNotFound $${782.79}$$ < $${788.65}$$ DataNotFound $${269.89}$$ < $${272.01}$$
docker_agent_arm64 DataNotFound $${795.16}$$ < $${802.0}$$ DataNotFound $${256.68}$$ < $${259.7}$$
docker_agent_jmx_amd64 DataNotFound $${973.66}$$ < $${979.84}$$ DataNotFound $${338.55}$$ < $${340.95}$$
docker_agent_jmx_arm64 DataNotFound $${974.63}$$ < $${981.8}$$ DataNotFound $${321.26}$$ < $${324.65}$$
docker_cluster_agent_amd64 DataNotFound $${212.63}$$ < $${214.5}$$ DataNotFound $${72.2}$$ < $${73.51}$$
docker_cluster_agent_arm64 DataNotFound $${228.54}$$ < $${230.33}$$ DataNotFound $${68.47}$$ < $${69.77}$$
docker_cws_instrumentation_amd64 DataNotFound $${7.07}$$ < $${7.12}$$ DataNotFound $${2.95}$$ < $${3.29}$$
docker_cws_instrumentation_arm64 DataNotFound $${6.69}$$ < $${6.92}$$ DataNotFound $${2.7}$$ < $${3.07}$$
docker_dogstatsd_amd64 DataNotFound $${38.48}$$ < $${39.57}$$ DataNotFound $${14.89}$$ < $${15.76}$$
docker_dogstatsd_arm64 DataNotFound $${37.16}$$ < $${38.2}$$ DataNotFound $${14.34}$$ < $${14.83}$$
dogstatsd_deb_amd64 DataNotFound $${29.71}$$ < $${31.4}$$ DataNotFound $${7.85}$$ < $${8.95}$$
dogstatsd_deb_arm64 DataNotFound $${28.3}$$ < $${29.97}$$ DataNotFound $${6.81}$$ < $${7.89}$$
dogstatsd_rpm_amd64 DataNotFound $${29.71}$$ < $${31.4}$$ DataNotFound $${7.86}$$ < $${8.96}$$
dogstatsd_suse_amd64 DataNotFound $${29.71}$$ < $${31.4}$$ DataNotFound $${7.86}$$ < $${8.96}$$
iot_agent_deb_amd64 DataNotFound $${54.13}$$ < $${54.55}$$ DataNotFound $${13.65}$$ < $${14.45}$$
iot_agent_deb_arm64 DataNotFound $${51.45}$$ < $${51.9}$$ DataNotFound $${11.81}$$ < $${12.63}$$
iot_agent_deb_armhf DataNotFound $${51.0}$$ < $${51.42}$$ DataNotFound $${11.88}$$ < $${12.74}$$
iot_agent_rpm_amd64 DataNotFound $${54.13}$$ < $${54.55}$$ DataNotFound $${13.67}$$ < $${14.47}$$
iot_agent_suse_amd64 DataNotFound $${54.13}$$ < $${54.55}$$ DataNotFound $${13.67}$$ < $${14.47}$$

@alopezz alopezz force-pushed the alopez/ddot-byoc-custom-base-ubuntu branch 2 times, most recently from ffb6822 to a0d95ba Compare August 26, 2025 09:03
@alopezz alopezz marked this pull request as ready for review August 26, 2025 09:25
@alopezz alopezz requested review from a team as code owners August 26, 2025 09:25
@alopezz alopezz requested a review from truthbk August 26, 2025 09:25
@alopezz alopezz added the ask-review Ask required teams to review this PR label Aug 26, 2025
@alopezz alopezz force-pushed the alopez/ddot-byoc-custom-base-ubuntu branch from a0d95ba to dcb55f0 Compare August 26, 2025 12:44
Copy link
Contributor

@liustanley liustanley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question on the Ubuntu versions

# Use the Ubuntu Slim AMD64 base image
FROM ubuntu:24.04 AS builder
FROM ubuntu:${UBUNTU_VERSION} AS builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to add tests for Ubuntu versions being used here other than 24.04 and 20.04, or are these the only versions with functional differences? I'm also wondering if we should restrict the versions that can be used with this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth being exhaustive here. The further back we go in versions, the more risk there is of something on the dockerfile breaking (that's why I had to change how dda is installed, for instance).

W.r.t. restrictions, the fact that we're relying on the user running docker build directly here doesn't leave a lot of room for logic. We could just use 20.04 all the time for the builder, but it's probably not good to default to an EOL version for everyone. I think we can always adjust this later, though, and we do have some plans to make it easier to use the same toolchain that the Agent uses for this, which could remove the need to specify this altogether.

@github-actions github-actions bot added medium review PR review might take time and removed short review PR is simple enough to be reviewed quickly labels Aug 28, 2025
@alopezz
Copy link
Contributor Author

alopezz commented Aug 28, 2025

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Aug 28, 2025

View all feedbacks in Devflow UI.

2025-08-28 08:10:56 UTC ℹ️ Start processing command /merge


2025-08-28 08:11:02 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 39m (p90).


2025-08-28 08:25:37 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 3eccd0b into main Aug 28, 2025
307 checks passed
@dd-mergequeue dd-mergequeue bot deleted the alopez/ddot-byoc-custom-base-ubuntu branch August 28, 2025 08:25
@github-actions github-actions bot added this to the 7.71.0 milestone Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ask-review Ask required teams to review this PR changelog/no-changelog medium review PR review might take time qa/no-code-change No code change in Agent code requiring validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants