-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Allow older Ubuntu versions to be used as base for building ddot-byoc #40218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Gitlab CI Configuration ChangesAdded Jobsddot_byoc_binary_build_test_ubuntu2004ddot_byoc_binary_build_test_ubuntu2004:
before_script:
- mkdir -p /tmp/otel-ci
- cp comp/otelcol/collector-contrib/impl/manifest.yaml /tmp/otel-ci/
- cp Dockerfiles/agent-ddot/Dockerfile.agent-otel /tmp/otel-ci/
- cp test/integration/docker/otel_agent_build_tests.py /tmp/otel-ci/
- wget https://github.com/mikefarah/yq/releases/download/3.4.1/yq_linux_amd64 -O
/usr/bin/yq && chmod +x /usr/bin/yq
- export OTELCOL_VERSION=v$(/usr/bin/yq r /tmp/otel-ci/manifest.yaml dist.version)
- yq w -i /tmp/otel-ci/manifest.yaml "receivers[+] gomod" "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sobjectsreceiver
${OTELCOL_VERSION}"
- yq w -i /tmp/otel-ci/manifest.yaml "processors[+] gomod" "github.com/open-telemetry/opentelemetry-collector-contrib/processor/metricstransformprocessor
${OTELCOL_VERSION}"
image: registry.ddbuild.io/ci/datadog-agent-buildimages/docker_x64$CI_IMAGE_DOCKER_X64_SUFFIX:$CI_IMAGE_DOCKER_X64
needs:
- integration_tests_otel
rules:
- if: $CI_COMMIT_BRANCH =~ /^mq-working-branch-/
when: never
- when: on_success
script:
- DOCKER_LOGIN=$($CI_PROJECT_DIR/tools/ci/fetch_secret.sh $DOCKER_REGISTRY_RO user)
|| exit $?
- $CI_PROJECT_DIR/tools/ci/fetch_secret.sh $DOCKER_REGISTRY_RO token | docker login
--username "$DOCKER_LOGIN" --password-stdin "$DOCKER_REGISTRY_URL"
- EXIT="${PIPESTATUS[0]}"; if [ $EXIT -ne 0 ]; then echo "Unable to locate credentials
needs gitlab runner restart"; exit $EXIT; fi
- AGENT_VERSION=$(dda inv -- agent.version --no-include-git --no-include-pre)
- "docker build \\\n --target artifact \\\n --output type=local,dest=./ \\\n \
\ --build-arg AGENT_BRANCH=$CI_COMMIT_REF_NAME \\\n --build-arg AGENT_VERSION=$AGENT_VERSION\
\ \\\n --build-arg UBUNTU_VERSION=20.04 \\\n -f /tmp/otel-ci/Dockerfile.agent-otel\
\ /tmp/otel-ci\n"
- "BIN_PATH=ddot-byoc/otel-agent\nif [ ! -f \"$BIN_PATH\" ]; then\n echo \"ERROR:\
\ Expected otel-agent binary not found in output directory!\" >&2\n exit 1\n\
fi\n"
- "ALLOWED_GLIBC=2.31\nREQUIRED_GLIBC=$(objdump -T \"$BIN_PATH\" 2>/dev/null | grep\
\ -o 'GLIBC_[0-9][0-9.]*' | sed 's/[^0-9.]//g' | sort -V | tail -1)\necho \"Detected\
\ required GLIBC version: ${REQUIRED_GLIBC:-unknown} (allowed max: $ALLOWED_GLIBC)\"\
\nif [ -z \"$REQUIRED_GLIBC\" ]; then\n echo \"WARNING: Could not detect GLIBC\
\ requirement from binary; proceeding without hard failure.\" >&2\nelse\n # This\
\ amounts to reporting an error when $REQUIRED_GLIBC > $ALLOWED_GLIBC\n highest_version=$(printf\
\ '%s\\n%s\\n' \"$REQUIRED_GLIBC\" \"$ALLOWED_GLIBC\" | sort -V | tail -1)\n \
\ if [[ \"$REQUIRED_GLIBC\" != \"$ALLOWED_GLIBC\" && \"$highest_version\" == \"\
$REQUIRED_GLIBC\" ]]; then\n echo \"ERROR: otel-agent requires GLIBC_$REQUIRED_GLIBC\
\ which exceeds expected GLIBC_$ALLOWED_GLIBC\" >&2\n objdump -T \"$BIN_PATH\"\
\ | grep 'GLIBC'\n exit 1\n fi\nfi\n"
stage: integration_test
tags:
- docker-in-docker:amd64 Changes Summary
ℹ️ Diff available in the job log. |
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: d004c1d Optimization Goals: ✅ No significant changes detected
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | docker_containers_cpu | % cpu utilization | -3.12 | [-6.20, -0.05] | 1 | Logs |
Fine details of change detection per experiment
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | quality_gate_metrics_logs | memory utilization | +2.50 | [+2.16, +2.84] | 1 | Logs bounds checks dashboard |
➖ | otlp_ingest_metrics | memory utilization | +0.84 | [+0.69, +0.98] | 1 | Logs |
➖ | quality_gate_logs | % cpu utilization | +0.82 | [-1.96, +3.60] | 1 | Logs bounds checks dashboard |
➖ | otlp_ingest_logs | memory utilization | +0.72 | [+0.56, +0.87] | 1 | Logs |
➖ | ddot_metrics | memory utilization | +0.57 | [+0.36, +0.78] | 1 | Logs |
➖ | file_to_blackhole_100ms_latency | egress throughput | +0.06 | [-0.57, +0.69] | 1 | Logs |
➖ | file_tree | memory utilization | +0.05 | [+0.02, +0.09] | 1 | Logs |
➖ | file_to_blackhole_0ms_latency | egress throughput | +0.04 | [-0.55, +0.62] | 1 | Logs |
➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.01 | [-0.57, +0.60] | 1 | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.01 | [-0.04, +0.06] | 1 | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.06, +0.07] | 1 | Logs |
➖ | file_to_blackhole_500ms_latency | egress throughput | -0.01 | [-0.61, +0.58] | 1 | Logs |
➖ | quality_gate_idle | memory utilization | -0.11 | [-0.14, -0.07] | 1 | Logs bounds checks dashboard |
➖ | quality_gate_idle_all_features | memory utilization | -0.24 | [-0.27, -0.20] | 1 | Logs bounds checks dashboard |
➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.28 | [-0.31, -0.24] | 1 | Logs |
➖ | ddot_logs | memory utilization | -0.46 | [-0.57, -0.35] | 1 | Logs |
➖ | tcp_syslog_to_blackhole | ingress throughput | -1.38 | [-1.45, -1.31] | 1 | Logs |
➖ | docker_containers_memory | memory utilization | -2.52 | [-2.68, -2.36] | 1 | Logs |
➖ | docker_containers_cpu | % cpu utilization | -3.12 | [-6.20, -0.05] | 1 | Logs |
Bounds Checks: ❌ Failed
perf | experiment | bounds_check_name | replicates_passed | links |
---|---|---|---|---|
❌ | docker_containers_cpu | simple_check_run | 9/10 | |
✅ | docker_containers_memory | memory_usage | 10/10 | |
❌ | docker_containers_memory | simple_check_run | 8/10 | |
✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
|
ffb6822
to
a0d95ba
Compare
a0d95ba
to
dcb55f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one question on the Ubuntu versions
# Use the Ubuntu Slim AMD64 base image | ||
FROM ubuntu:24.04 AS builder | ||
FROM ubuntu:${UBUNTU_VERSION} AS builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to add tests for Ubuntu versions being used here other than 24.04 and 20.04, or are these the only versions with functional differences? I'm also wondering if we should restrict the versions that can be used with this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's worth being exhaustive here. The further back we go in versions, the more risk there is of something on the dockerfile breaking (that's why I had to change how dda is installed, for instance).
W.r.t. restrictions, the fact that we're relying on the user running docker build
directly here doesn't leave a lot of room for logic. We could just use 20.04 all the time for the builder, but it's probably not good to default to an EOL version for everyone. I think we can always adjust this later, though, and we do have some plans to make it easier to use the same toolchain that the Agent uses for this, which could remove the need to specify this altogether.
/merge |
View all feedbacks in Devflow UI.
The expected merge time in
|
What does this PR do?
It makes it possible to build custom ddot (byoc) targeting a broader compatibility base (defined by available version of glibc on the OS), by allowing the choice of the base version of Ubuntu used, via
--build-arg UBUNTU_VERSION=xx.xx
.Motivation
Allow more users to benefit from the ddot byoc workflow.
Describe how you validated your changes
Added a job to the pipeline which runs the build on Ubuntu 20.04 and checks that the maximum glibc version required by the binary doesn't exceed what would be expected for that version of Ubuntu.
Possible Drawbacks / Trade-offs
Additional Notes
This pins the
dda
version in use and switches to its "standalone" version such that we don't need to worry about the python version or python env management.In practice, by dropping to Ubuntu 20.04, we currently already see the glibc requirement drop to 2.17 max, which makes it at least as compatible as the regular Agent.
We eventually plan to package our C/C++ toolchains that target the compatibility that the Agent supports, which would probably give us a better way to provide the necessary compatibility without the need to change the base image like done here.