✨ OPRUN-4240: Add test profiling toolchain for heap and CPU analysis #2298

tmshort · 2025-11-03T15:22:04Z

✨ Add e2e profiling toolchain for heap and CPU analysis
Introduces automated memory and CPU profiling for e2e tests with:

Automatic port-forwarding to Kubernetes deployment pprof endpoints
Configurable periodic heap and CPU profile collection with differential timing
Analysis report generation with growth metrics and top allocators
Makefile targets: start-profiling, stop-profiling, analyze-profiles

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]
Signed-off-by: Todd Short [email protected]

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

netlify · 2025-11-03T15:22:09Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`a75a2b0`
🔍 Latest deploy log	https://app.netlify.com/projects/olmv1/deploys/690e5d0a93e3aa00082b8c74
😎 Deploy Preview	https://deploy-preview-2298--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

codecov · 2025-11-03T15:34:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.21%. Comparing base (9530175) to head (a75a2b0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2298      +/-   ##
==========================================
- Coverage   74.24%   74.21%   -0.03%     
==========================================
  Files          91       91              
  Lines        7046     7046              
==========================================
- Hits         5231     5229       -2     
- Misses       1402     1403       +1     
- Partials      413      414       +1

Flag	Coverage Δ
e2e	`45.86% <ø> (-0.04%)`	⬇️
experimental-e2e	`48.17% <ø> (-0.04%)`	⬇️
unit	`58.58% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pedjak

Given that all other interactions with the code are happening through Makefile targets, it would be nice if we can continue doing that for the profiling as well, something like:

make baseline-profile test-experiment-e2e

or maybe

make test-experiment-e2e PROFILING=baseline

Similar goes for other added scripts.

hack/tools/e2e-profiling/collect-profiles.sh

pedjak · 2025-11-04T11:56:10Z

hack/tools/e2e-profiling/collect-profiles.sh

+    # Get pod name
+    log_info "Finding pod with label: ${POD_LABEL}"
+    POD=$(kubectl get pod -n "${NAMESPACE}" -l "${POD_LABEL}" -o name | head -1)
+    if [ -z "${POD}" ]; then
+        log_error "No pod found with label ${POD_LABEL}"
+        exit 1
+    fi
+    log_success "Found pod: ${POD}"
+    PODS[$component]="${POD}"
+
+    # Create component output directory
+    mkdir -p "${OUTPUT_DIR}/${component}"
+
+    # Set up port forwarding
+    log_info "Setting up port forwarding to ${POD}:${PPROF_PORT} -> localhost:${LOCAL_PORT}..."
+    kubectl port-forward -n "${NAMESPACE}" "${POD}" "${LOCAL_PORT}:${PPROF_PORT}" > "${OUTPUT_DIR}/${component}/port-forward.log" 2>&1 &
+    PF_PIDS[$component]=$!
+    log_info "Port-forward started (PID: ${PF_PIDS[$component]})"


we could set port-forwarding on the deployment directly, and that would simplify the script, because we do not need to search pods - deployment name is fixed.

pedjak · 2025-11-04T12:02:06Z

hack/tools/e2e-profiling/collect-profiles.sh

+
+# Wait for port forwards to be ready
+log_info "Waiting for port forwards to initialize..."
+sleep 5


instead of wait here, I would try to connect to the endpoint and retry several times if I get the error.

pedjak · 2025-11-04T12:03:46Z

hack/tools/e2e-profiling/collect-profiles.sh

+log_success "Collection complete. Collected ${n} profiles from each component."
+
+# Clean up empty profiles (created during cluster teardown)
+log_info "Cleaning up empty profile files..."


perhaps this should be executed also when user presses ctrl-c?

pedjak · 2025-11-04T12:05:15Z

hack/tools/e2e-profiling/e2e-profile.sh

+EOF
+}
+
+log_info() {
+    echo -e "${BLUE}[INFO]${NC} $*"
+}
+
+log_error() {
+    echo -e "${RED}[ERROR]${NC} $*"
+}
+
+log_success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $*"
+}


All added scripts have a share of common code. Perhaps we should extract it into something like common.sh and source it other scripts?

pedjak · 2025-11-04T12:12:06Z

helm/olmv1/templates/deployment-olmv1-system-catalogd-controller-manager.yml

            - --leader-elect
            {{- end }}
            - --metrics-bind-address=:7443
+            {{- if .Values.options.e2e.enabled }}


Profiling could we turned on also for other reasons, not just for e2e tests. Hence, I would suggest to name this option something like:

.Values.options.profiling.enabled

pedjak · 2025-11-04T12:14:26Z

hack/tools/e2e-profiling/PLUGIN_SUMMARY.md

+---
+
+**Created:** 2025-10-28
+**Status:** Production Ready


Claude thinks it is. :)

pedjak · 2025-11-04T12:15:07Z

hack/tools/e2e-profiling/PLUGIN_SUMMARY.md

+
+**Created:** 2025-10-28
+**Status:** Production Ready
+**Version:** 1.0.0


Are we gonna track the version of this separately from the released artifacts?

There's no need to version this, so we won't bother trying to keep it in sync.

pedjak · 2025-11-04T12:16:00Z

hack/tools/e2e-profiling/USAGE_EXAMPLES.md

+# This will:
+# 1. Start make test-experimental-e2e
+# 2. Wait for pod to be ready
+# 3. Collect profiles every 15s


the code and PR description are actually speaking about 10s intervals.

Yup, I changed the default, and Claude didn't pick up on this.

pedjak · 2025-11-04T12:17:12Z

hack/tools/e2e-profiling/USAGE_EXAMPLES.md

+
+```bash
+# Set up symlinks to existing data
+ln -s /home/tshort/experimental-e2e-testing/test1 e2e-profiles/baseline


perhaps we should make some more abstract paths instead of publishing /home/tshort ?

Yeah, this is probably a bad example... and Claude didn't clean it up.

tmshort · 2025-11-04T14:48:52Z

Given that all other interactions with the code are happening through Makefile targets, it would be nice if we can continue doing that for the profiling as well, something like:
make baseline-profile test-experiment-e2e
or maybe
make test-experiment-e2e PROFILING=baseline
Similar goes for other added scripts.

I'm going to disagree here. We have a number of other utilities that can be run outside the Makefile. Also, passing arguments to Makefiles is onerous (as given by your example of having to use PROFILING= to specify the name of the profile).

pedjak · 2025-11-04T16:48:03Z

I'm going to disagree here. We have a number of other utilities that can be run outside the Makefile. Also, passing arguments to Makefiles is onerous (as given by your example of having to use PROFILING= to specify the name of the profile).

We should not prevent people to call the scripts directly, but having no target on Makefile level makes them harder to discover.

I would also say that profiling itself is orthogonal to running e2e tests. We could profile controllers even if do not run tests. If we make target for starting and stopping profiling, then we could do something like:

make baseline-profiling e2e stop-profiling

If we want to profile without running tests, we can then do

make baseline-profiling
# play with OLM
# ...
# ...
make stop-profiling

anik120

Consider this separation:

e2e profiling toolchain for heap and CPU analysis in introduced (independent of any Claude/AI stuff)
A helpful Claude command is introduced to help with the AI-ness (in a different PR)

Isn't a that more cleaner?

1 is something we'd do traditionally (but now it's being done with assistance from AI).
2 is the new world stuff.

anik120 · 2025-11-04T18:40:28Z

hack/tools/e2e-profiling/PLUGIN_SUMMARY.md

+# E2E Profiling Plugin - Summary
+
+## 📊 What Was Created


Same comment as the other PR, we probably shouldn't commit this to main. The Readme.md and Usage.md seems to be doing the job.

This file is useful for Claude to know how to use the tool as a plug-in.

tmshort · 2025-11-04T18:44:20Z

We should not prevent people to call the scripts directly, but having no target on Makefile level makes them harder to discover.

I would also say that profiling itself is orthogonal to running e2e tests. We could profile controllers even if do not run tests. If we make target for starting and stopping profiling, then we could do something like:
make baseline-profiling e2e stop-profiling
If we want to profile without running tests, we can then do
make baseline-profiling
# play with OLM
# ...
# ...
make stop-profiling

I hear your concerns. Because the scripts are expecting to run a specific e2e test, that would require a bit of refactoring. The functionality works now with the existing e2e tests. I would consider doing that as a follow up.

tmshort · 2025-11-04T19:13:32Z

Consider this separation:

e2e profiling toolchain for heap and CPU analysis in introduced (independent of any Claude/AI stuff)

A helpful Claude command is introduced to help with the AI-ness (in a different PR)

Isn't a that more cleaner?

1 is something we'd do traditionally (but now it's being done with assistance from AI). 2 is the new world stuff.

That's starting to sound like busy-work. I can do it, but I don't really see the benefit.
EDIT: I will also point out that having two separate PRs make it harder to test together... but that's a me problem.

tmshort · 2025-11-04T22:39:53Z

/hold
I'm updating this to work with make... so holding

Review is stale.

tmshort · 2025-11-05T18:55:28Z

/unhold

pedjak · 2025-11-06T08:28:34Z

I did a lint pass with ShellCheck:

In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 34:
    local profile_count=$(find -L "${component_dir}" -maxdepth 1 -name "heap*.pprof" -type f 2>/dev/null | wc -l)
          ^-----------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 43:
    local peak_profile=$(ls -lS "${component_dir}"/heap*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
          ^----------^ SC2155 (warning): Declare and assign separately to avoid masking return values.
                         ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In analyze-profiles.sh line 44:
    local peak_name=$(basename "${peak_profile}")
          ^-------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 45:
    local peak_size=$(du -h "${peak_profile}" | cut -f1)
          ^-------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 54:
    local peak_stats=$(cd "${component_dir}" && go tool pprof -top "${peak_name}" 2>/dev/null | head -6)
          ^--------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 55:
    local peak_total=$(echo "${peak_stats}" | grep "^Showing" | awk '{print $7, $8}')
          ^--------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 74:
    for f in $(ls "${component_dir}"/heap*.pprof 2>/dev/null | sort -V); do
               ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In analyze-profiles.sh line 75:
        local name=$(basename "$f")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 76:
        local size=$(stat -c%s "$f")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 192:
    local cpu_profile_count=$(find -L "${component_dir}" -maxdepth 1 -name "cpu*.pprof" -type f 2>/dev/null | wc -l)
          ^---------------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 201:
    local peak_cpu_profile=$(ls -lS "${component_dir}"/cpu*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
          ^--------------^ SC2155 (warning): Declare and assign separately to avoid masking return values.
                             ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In analyze-profiles.sh line 202:
    local peak_cpu_name=$(basename "${peak_cpu_profile}")
          ^-----------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 203:
    local peak_cpu_size=$(du -h "${peak_cpu_profile}" | cut -f1)
          ^-----------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 209:
    local cpu_stats=$(cd "${component_dir}" && go tool pprof -top "${peak_cpu_name}" 2>/dev/null | head -6)
          ^-------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 210:
    local cpu_total=$(echo "${cpu_stats}" | grep "^Showing" | awk '{print $7, $8}')
          ^-------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In analyze-profiles.sh line 385:
    peak_profile=$(ls -lS "${component_dir}"/heap*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
                   ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In collect-profiles.sh line 51:
    source "${STATE_FILE}"
           ^-------------^ SC1090 (warning): ShellCheck can't follow non-constant source. Use a directive to specify location.


In common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In compare-profiles.sh line 47:
PEAK1=$(ls -lS "${TEST1_DIR}"/operator-controller/heap*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
        ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 48:
PEAK2=$(ls -lS "${TEST2_DIR}"/operator-controller/heap*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
        ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 51:
PEAK1_CATALOGD=$(ls -lS "${TEST1_DIR}"/catalogd/heap*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
                 ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 52:
PEAK2_CATALOGD=$(ls -lS "${TEST2_DIR}"/catalogd/heap*.pprof 2>/dev/null | head -1 | awk '{print $NF}')
                 ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 137:
    (ls "${TEST1_DIR}"/operator-controller/heap*.pprof 2>/dev/null | sed 's/.*heap\([0-9]*\)\.pprof/\1/';
     ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 138:
     ls "${TEST2_DIR}"/operator-controller/heap*.pprof 2>/dev/null | sed 's/.*heap\([0-9]*\)\.pprof/\1/') \
     ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 192:
    (ls "${TEST1_DIR}"/catalogd/heap*.pprof 2>/dev/null | sed 's/.*heap\([0-9]*\)\.pprof/\1/';
     ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In compare-profiles.sh line 193:
     ls "${TEST2_DIR}"/catalogd/heap*.pprof 2>/dev/null | sed 's/.*heap\([0-9]*\)\.pprof/\1/') \
     ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In run-profiled-test.sh line 138:
        TEST_EXIT=$?
        ^-------^ SC2034 (warning): TEST_EXIT appears unused. Verify use (or export if used externally).


In run-profiled-test.sh line 152:
            LATEST=$(ls -t "${OUTPUT_DIR}"/heap*.pprof 2>/dev/null | head -1)
                     ^-- SC2012 (info): Use find instead of ls to better handle non-alphanumeric filenames.


In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In start-profiling.sh line 70:
nohup bash -c '
              ^-- SC2016 (info): Expressions don't expand in single quotes, use double quotes for that.


In ./common.sh line 44:
        local dir="$(cd -P "$(dirname "$source")" && pwd)"
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 59:
        local dir=$(dirname "$path")
              ^-^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 60:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 64:
        local parent=$(dirname "$path")
              ^----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In ./common.sh line 65:
        local base=$(basename "$path")
              ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In stop-profiling.sh line 52:
source "${STATE_FILE}"
       ^-------------^ SC1090 (warning): ShellCheck can't follow non-constant source. Use a directive to specify location.

For more information:
  https://www.shellcheck.net/wiki/SC1090 -- ShellCheck can't follow non-const...
  https://www.shellcheck.net/wiki/SC2034 -- TEST_EXIT appears unused. Verify ...
  https://www.shellcheck.net/wiki/SC2155 -- Declare and assign separately to ...

pedjak · 2025-11-05T19:25:33Z

Makefile

+start-profiling: #HELP Start profiling in background. Run your tests, then use 'make stop-profiling'. Use PROFILE_NAME=<name> to specify output name.
+	./hack/tools/e2e-profiling/start-profiling.sh $(PROFILE_NAME)


if we craft the target like:

start-profiling/%: ./hack/tools/e2e-profiling/start-profiling.sh $*

then we could simply the invocation with:

make start-profiling/foo

pedjak · 2025-11-06T08:20:14Z

hack/tools/e2e-profiling/analyze-profiles.sh

+    cat >> "${REPORT_FILE}" << 'EOF'
+
+### Top Memory Allocators (Peak Profile)
+
+```
+EOF
+
+    cd "${component_dir}" && go tool pprof -top "${peak_name}" 2>/dev/null | head -20 >> "${REPORT_FILE}"
+
+    cat >> "${REPORT_FILE}" << 'EOF'
+```
+
+EOF
+
+    # OpenAPI-specific analysis
+    log_info "  Analyzing OpenAPI allocations..." >&2
+    cat >> "${REPORT_FILE}" << 'EOF'
+
+### OpenAPI-Related Allocations
+
+```
+EOF


I think these three cat invocation (and perhaps others too) could be merged into single one:

cat >> "${REPORT_FILE}" << EOF ### Top Memory Allocators (Peak Profile) ` `` $(cd "${component_dir}" && go tool pprof -top "${peak_name}" 2>/dev/null | head -20) ` `` ### OpenAPI-Related Allocations ` `` $(cd "${component_dir}" && go tool pprof -text "${peak_name}" 2>/dev/null | grep -i openapi | head -20 >> "${REPORT_FILE}" || echo "No OpenAPI allocations found") ` `` EOF

It would mprove readability if we wrap the calls to go tool pprof into bash functions.

Can you please confirm your requsted change? I don't think it would work as given.

Can you please confirm your requsted change? I don't think it would work as given.

It is a nit, but IMHO is more readable. See this little example:

function bar() { echo "hello world" } cat << EOF first line $(bar) second line EOF

I've cleaned this up a bit, hopefully more readable.

pedjak · 2025-11-06T08:33:13Z

hack/tools/e2e-profiling/start-profiling.sh

+
+# Start everything in background
+nohup bash -c '
+set -euo pipefail


could we move this into a separare script instead keeping it in this large string?

tmshort · 2025-11-06T14:41:43Z

I did a lint pass with ShellCheck:

We don't use ShellCheck for anything... so that is a requirement that has not been placed on anything else. I don't think it's appropriate to apply it on this and not anything else. In other words, if we want to use ShellCheck, we should create a verify target for it. It does pass a bash -n check.

However, I'll look at some of those issues.

camilamacedo86 · 2025-11-06T15:30:40Z

helm/olmv1/templates/deployment-olmv1-system-operator-controller-controller-manager.yml

            - --metrics-bind-address=:8443
+            {{- if .Values.options.profiling.enabled }}
+            - --pprof-bind-address=:6060
+            {{- end }}


👍 Liked we do not do that by default
Perfect

camilamacedo86 · 2025-11-06T15:34:27Z

hack/tools/e2e-profiling/README.md

+- [ ] Add real-time dashboard
+- [ ] Support different output formats (JSON, CSV)
+- [ ] Add mutex profiling
+- [ ] Support custom component configurations


I think it all is generated by AI
Could we just shape it and have only what is required here, making sense?

I've reduced it.

pedjak · 2025-11-06T15:46:20Z

general comment/question: the introduced logic is not so trivial, and I wonder if we would get on maintainability if it is written in some other more suited language than bash? (not pushing for anything particular here)

tmshort · 2025-11-06T15:56:45Z

general comment/question: the introduced logic is not so trivial, and I wonder if we would get on maintainability if it is written in some other more suited language than bash? (not pushing for anything particular here)

If it's non-trivial in bash, it's likely non-trivial in any other language. Given how far this has progressed, I'm reluctant to switch to another language and take even longer to get this merged. I've also tried to minimize the number of required tools, and doing it in anything other than Bash or Go will add new build environment requirements. As of now, nothing new is required to run this.

That being said... I can look to see what would happen if it were written in golang, but I don't want that to hold up this PR.

pedjak

/lgtm thanks!

camilamacedo86 · 2025-11-07T01:33:18Z

helm/olmv1/values.yaml

  e2e:
    enabled: false
+  profiling:
+    enabled: false


👍 for stable disabled by default great

camilamacedo86 · 2025-11-07T10:58:23Z

hack/tools/e2e-profiling/README.md

+## Requirements
+
+- kubectl (cluster access)
+- go (for `go tool pprof`)
+- make
+- curl
+- bash 4.0+
+
+## Real-World Results
+
+OpenAPI caching optimization:
+- **Memory:** 49.6 MB → 41.2 MB (-16.9%)
+- **OpenAPI allocations:** 13 MB → 3.5 MB (-73%)
+- **Key insight:** Repeated schema fetching was #1 memory consumer
+
+## Architecture
+
+**Scripts:**
+- `profile-collector-daemon.sh` - Background collection process
+- `start-profiling.sh` - Start daemon mode profiling
+- `stop-profiling.sh` - Stop daemon and cleanup
+- `run-profiled-test.sh` - Orchestrate test + profiling
+- `analyze-profiles.sh` - Generate analysis reports
+- `compare-profiles.sh` - Create comparison reports
+- `common.sh` - Shared utilities and logging
+
+**Key Features:**
+- Deployment-based port-forwarding (survives pod restarts)
+- Automatic retry with 30s timeout
+- Graceful cleanup on exit/interrupt
+- Multi-component support (operator-controller + catalogd)


Suggested change

## Requirements

- kubectl (cluster access)

- go (for `go tool pprof`)

- make

- curl

- bash 4.0+

## Real-World Results

OpenAPI caching optimization:

- **Memory:** 49.6 MB → 41.2 MB (-16.9%)

- **OpenAPI allocations:** 13 MB → 3.5 MB (-73%)

- **Key insight:** Repeated schema fetching was #1 memory consumer

## Architecture

**Scripts:**

- `profile-collector-daemon.sh` - Background collection process

- `start-profiling.sh` - Start daemon mode profiling

- `stop-profiling.sh` - Stop daemon and cleanup

- `run-profiled-test.sh` - Orchestrate test + profiling

- `analyze-profiles.sh` - Generate analysis reports

- `compare-profiles.sh` - Create comparison reports

- `common.sh` - Shared utilities and logging

**Key Features:**

- Deployment-based port-forwarding (survives pod restarts)

- Automatic retry with 30s timeout

- Graceful cleanup on exit/interrupt

- Multi-component support (operator-controller + catalogd)

I think we can remove this section — what do you think?
It looks like it might have been AI-generated and doesn’t seem necessary for our use case.
It also feels a bit verbose and not quite aligned with the writing style we usually use for these READMEs.

I think the level is fine. It was definitely reduced, and can always be edited back later. Generally, for documentation, more detail is better.

This is actually a great example of how it's helpful to generate docs like this to provide additional context.

and not quite aligned with the writing style we usually use for these READMEs.

Writing styles before were plagued with not-enough context syndrome, which made contributors reliant on person/people who wrote the code to understand context (tribal knowledge).

I personally saw this file while doing my first "eye over" of the PR, and was grateful to get more context from this file about what to expect from this PR (just like how reading the summary of a research paper helps understand the rest of the paper)

camilamacedo86 · 2025-11-07T11:01:12Z

.gitignore

+.claude/cache/
+.claude/logs/
+.claude/.session*
+.claude/*.log


Could not be

# Ignore Claude local settings, history, and logs .claude/

Also, it seems out of scope here
Are we adding any claude implementation?

This is what Claude said to include in the .gitignore; it knows what should be there! :)

@tmshort this does feel a little out of place tbh. Everything else is looking great.

camilamacedo86

Hey @tmshort

I added just a few nits.
I think the README we need to shape
AI will add info not really required and or not helpful.
Often too verbose, so I think it requires some human adjustments.
PS.: I did not test it out locally.

Beyond that I am ok with we add it.
@dtfranz worked in the performance checks so I think it would be nice get his review.

anik120

@todd there's just one little tiny thing bothering me otherwise this is looking great!

anik120 · 2025-11-07T16:01:01Z

hack/tools/e2e-profiling/README.md

+## Requirements
+
+- kubectl (cluster access)
+- go (for `go tool pprof`)
+- make
+- curl
+- bash 4.0+
+
+## Real-World Results
+
+OpenAPI caching optimization:
+- **Memory:** 49.6 MB → 41.2 MB (-16.9%)
+- **OpenAPI allocations:** 13 MB → 3.5 MB (-73%)
+- **Key insight:** Repeated schema fetching was #1 memory consumer
+
+## Architecture
+
+**Scripts:**
+- `profile-collector-daemon.sh` - Background collection process
+- `start-profiling.sh` - Start daemon mode profiling
+- `stop-profiling.sh` - Stop daemon and cleanup
+- `run-profiled-test.sh` - Orchestrate test + profiling
+- `analyze-profiles.sh` - Generate analysis reports
+- `compare-profiles.sh` - Create comparison reports
+- `common.sh` - Shared utilities and logging
+
+**Key Features:**
+- Deployment-based port-forwarding (survives pod restarts)
+- Automatic retry with 30s timeout
+- Graceful cleanup on exit/interrupt
+- Multi-component support (operator-controller + catalogd)


This is actually a great example of how it's helpful to generate docs like this to provide additional context.

and not quite aligned with the writing style we usually use for these READMEs.

Writing styles before were plagued with not-enough context syndrome, which made contributors reliant on person/people who wrote the code to understand context (tribal knowledge).

I personally saw this file while doing my first "eye over" of the PR, and was grateful to get more context from this file about what to expect from this PR (just like how reading the summary of a research paper helps understand the rest of the paper)

anik120 · 2025-11-07T16:03:13Z

.gitignore

+.claude/cache/
+.claude/logs/
+.claude/.session*
+.claude/*.log


@tmshort this does feel a little out of place tbh. Everything else is looking great.

tmshort · 2025-11-07T17:00:32Z

OK, I've converted this into golang (vs. bash) and generalized it a bit more:

different namespaces for catd and opcon
renamed from e2e-profiles to make it more generic
Reduced the verbiage of the README.md (sorry, @anik120)
Removed AI-specific stuff (specifically around .gitignore)

camilamacedo86 · 2025-11-07T17:48:41Z

hack/tools/test-profiling/README.md

+## Requirements
+
+**Runtime:**
+- Kubernetes cluster access (via kubeconfig)
+
+**Build:**
+- go 1.24+
+- make


Will be hard we keep it maintained
when we bump do we really need this section?

I'm looking to remove it, and make other improvements... this is becoming an HnH project today,...

camilamacedo86

Great work ! 🎉
Nit: Do we need the requirement section in the doc
if not, I think is better we remove since will be hard for us keep maintained.

Introduces automated memory and CPU profiling for e2e tests with: - Automatic port-forwarding to Kubernetes deployment pprof endpoints - Configurable periodic heap and CPU profile collection with differential timing - Analysis report generation with growth metrics and top allocators - Makefile targets: start-profiling, stop-profiling, analyze-profiles 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Todd Short <[email protected]>

openshift-ci · 2025-11-07T22:25:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: camilamacedo86, pedjak
Once this PR has been reviewed and has the lgtm label, please assign kevinrizza for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tmshort requested a review from a team as a code owner November 3, 2025 15:22

openshift-ci bot requested review from OchiengEd and dtfranz November 3, 2025 15:22

tmshort changed the title ~~✨ OPRUN:4240: Add e2e profiling toolchain for heap and CPU analysis~~ ✨ OPRUN-4240: Add e2e profiling toolchain for heap and CPU analysis Nov 3, 2025

tmshort mentioned this pull request Nov 3, 2025

🐛 OPRUN-4239: Memory usage improvements #2290

Merged

pedjak reviewed Nov 4, 2025

View reviewed changes

tmshort force-pushed the memory-profiling branch from 881c6a5 to e02313c Compare November 4, 2025 15:51

tmshort requested a review from pedjak November 4, 2025 15:52

anik120 previously requested changes Nov 4, 2025

View reviewed changes

openshift-ci bot assigned anik120 Nov 4, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2025

tmshort force-pushed the memory-profiling branch from e02313c to a645f32 Compare November 5, 2025 18:44

tmshort mentioned this pull request Nov 5, 2025

✨ OPRUN-4240: Add Claude Code integration for e2e profiling #2309

Open

4 tasks

tmshort force-pushed the memory-profiling branch from a645f32 to eb1ecbb Compare November 5, 2025 18:51

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2025

pedjak reviewed Nov 6, 2025

View reviewed changes

camilamacedo86 reviewed Nov 6, 2025

View reviewed changes

tmshort force-pushed the memory-profiling branch from eb1ecbb to c121576 Compare November 6, 2025 17:45

pedjak approved these changes Nov 6, 2025

View reviewed changes

openshift-ci bot assigned pedjak Nov 6, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2025

camilamacedo86 reviewed Nov 7, 2025

View reviewed changes

helm/olmv1/values.yaml

e2e:

enabled: false

profiling:

enabled: false

Copy link

Contributor

camilamacedo86 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for stable disabled by default great

camilamacedo86 reviewed Nov 7, 2025

View reviewed changes

anik120 reviewed Nov 7, 2025

View reviewed changes

tmshort force-pushed the memory-profiling branch from c121576 to 8f49c58 Compare November 7, 2025 16:56

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025

tmshort changed the title ~~✨ OPRUN-4240: Add e2e profiling toolchain for heap and CPU analysis~~ ✨ OPRUN-4240: Add test profiling toolchain for heap and CPU analysis Nov 7, 2025

camilamacedo86 reviewed Nov 7, 2025

View reviewed changes

camilamacedo86 approved these changes Nov 7, 2025

View reviewed changes

openshift-ci bot assigned camilamacedo86 Nov 7, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025

tmshort force-pushed the memory-profiling branch from 8f49c58 to 5bf313d Compare November 7, 2025 20:49

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025

tmshort force-pushed the memory-profiling branch from 5bf313d to 5147fad Compare November 7, 2025 20:50

tmshort force-pushed the memory-profiling branch from 5147fad to a75a2b0 Compare November 7, 2025 20:56

camilamacedo86 approved these changes Nov 7, 2025

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025

		start-profiling: #HELP Start profiling in background. Run your tests, then use 'make stop-profiling'. Use PROFILE_NAME=<name> to specify output name.
		./hack/tools/e2e-profiling/start-profiling.sh $(PROFILE_NAME)

✨ OPRUN-4240: Add test profiling toolchain for heap and CPU analysis #2298

Are you sure you want to change the base?

✨ OPRUN-4240: Add test profiling toolchain for heap and CPU analysis #2298

Uh oh!

Conversation

tmshort commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer Checklist

Uh oh!

netlify bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for olmv1 ready!

Uh oh!

codecov bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pedjak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmshort commented Nov 4, 2025

Uh oh!

pedjak commented Nov 4, 2025

Uh oh!

anik120 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmshort commented Nov 4, 2025

Uh oh!

tmshort commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmshort commented Nov 4, 2025

Uh oh!

tmshort commented Nov 5, 2025

Uh oh!

pedjak commented Nov 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pedjak Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tmshort commented Nov 3, 2025 •

edited

Loading

netlify bot commented Nov 3, 2025 •

edited

Loading

codecov bot commented Nov 3, 2025 •

edited

Loading

tmshort commented Nov 4, 2025 •

edited

Loading

pedjak Nov 6, 2025 •

edited

Loading

camilamacedo86 Nov 7, 2025 •

edited

Loading

camilamacedo86 left a comment •

edited

Loading