Skip to content

Conversation

bderrly
Copy link

@bderrly bderrly commented Aug 15, 2025

Description: Returns a valid, empty configuration struct in the v1alpha1 conversion webhook rather than returning a failure.

Link to tracking Issue(s): #4288

Testing: Added a few tests with invalid YAML configuration strings to ensure that a valid configuration, and no error, is returned from the webhook.

Documentation: None

@bderrly bderrly requested a review from a team as a code owner August 15, 2025 16:17
Copy link
Member

@frzifus frzifus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bderrly , could you please add a changelog entry and does it make sense to add an e2e test too? :)

@pavolloffay
Copy link
Member

@CodiumAI-Agent /review

This change is meant to make the conversion webhook to be more resilient
in the face of invalid input.

It is rare, though possible, for an opentelemetrycollector custom
resource to be stored as a v1alpha1 version and with an invalid
configuration. With an older version of the operator running this would
simply result in no collector starting due to an invalid
configuration.

However, once the operator is updated to a new enough version where
v1beta1 is the stored version, this invalid configuration will result in
the conversion webhook failing. After several minutes the operator will
crash.

On a production cluster we had this situation result in the resource
quota controller failing to initialize after a control-plane upgrade.
This resulted in an incident where new jobs could not be created.
@bderrly bderrly force-pushed the api-conversion-resilience branch from f059f52 to 780bc39 Compare August 21, 2025 00:29
@bderrly
Copy link
Author

bderrly commented Aug 21, 2025

Thanks for taking a look @frzifus. I added the changelog file as well as an e2e test.

@frzifus frzifus requested a review from pavolloffay August 21, 2025 07:40
@@ -0,0 +1,19 @@
apiVersion: opentelemetry.io/v1alpha1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does v1beta1 fail on the admision with this config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since Config changed from string to an object:

Error from server: error when creating "otel-test.yaml": admission webhook "mopentelemetrycollectorbeta.kb.io" denied the request: json: cannot unmarshal string into Go struct field OpenTelemetryCollectorSpec.spec.config of type v1beta1.Config

@swiatekm swiatekm requested a review from pavolloffay August 29, 2025 12:17
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for contributing this fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API conversion webhook failure can cause cluster-wide outage
5 participants