Skip to content

Conversation

omerap12
Copy link
Member

What type of PR is this?

/kind documentation
/kind feature
/kind api-change

What this PR does / why we need it:

AEP for #7650

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler labels Apr 12, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 12, 2025
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
@omerap12
Copy link
Member Author

/assign @raywainman

Signed-off-by: Omer Aplatony <[email protected]>
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@omerap12 omerap12 changed the title AEP: per-vpa-component-configuration AEP-8026: per-vpa-component-configuration Apr 18, 2025
Copy link
Member

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this @omerap12, thanks for putting this together!

* recommendationMarginFraction
* Other parameters that benefit from workload-specific tuning

### Validation via CEL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we still do some basic validating in the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, e2e tests will be included: 5b89607

Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
@adrianmoisey
Copy link
Member

Do these API changes need to be feature-gated and go through the process of:

  1. Default off
  2. Default on
  3. GA

I don't think the VPA project really promises a downgrade path, but Kubernetes does. If we use what Kubernetes has defined, then we may need to do this.

@omerap12
Copy link
Member Author

omerap12 commented May 2, 2025

Do these API changes need to be feature-gated and go through the process of:

  1. Default off
  2. Default on
  3. GA

I don't think the VPA project really promises a downgrade path, but Kubernetes does. If we use what Kubernetes has defined, then we may need to do this.

I thought you are on vacation!
But I don't think we need to do the all feature gate process , maybe I'm wrong here..
@raywainman @voelzmo wdyt?

@adrianmoisey
Copy link
Member

I thought you are on vacation!

I'm easing back into open source over the next week or so, and playing catch up.

But I don't think we need to do the all feature gate process , maybe I'm wrong here..

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

In the past it seems as though we loosely adopt what Kubernetes promises (ability to downgrade), but that doesn't seem to be a formal decision.

@omerap12
Copy link
Member Author

omerap12 commented May 2, 2025

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

What does downgrade means in this context? cause you are just adding fields..

@adrianmoisey
Copy link
Member

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

What does downgrade means in this context? cause you are just adding fields..

Something like this is what I'm thinking:

  1. Someone upgrades the VPA to a version including this feature (CRDs and all components upgraded)
  2. They then use one of these fields for a VPA
  3. They downgrade. I assume the CRD is not downgraded... only a VPA component, to a version that doesn't support the new feature

Their previously-set VPA setting (oomBumpUpRatio, for example) no longer works.

It's not a big deal, but it's how Kubernetes rolls out new features.

@omerap12
Copy link
Member Author

omerap12 commented May 3, 2025

I think it depends on what our promise is to the user. If we promise the ability to downgrade by 1 version, then we need to do the feature-gating
If we don't promise anything, then we're good and don't need to feature-gate.

What does downgrade means in this context? cause you are just adding fields..

Something like this is what I'm thinking:

  1. Someone upgrades the VPA to a version including this feature (CRDs and all components upgraded)
  2. They then use one of these fields for a VPA
  3. They downgrade. I assume the CRD is not downgraded... only a VPA component, to a version that doesn't support the new feature

Their previously-set VPA setting (oomBumpUpRatio, for example) no longer works.

It's not a big deal, but it's how Kubernetes rolls out new features.

I understand. Right now, if someone updates the VPA and sets a different oomBumpUpRatio than the global setting, and then later downgrades, it will go back to the global value.
I can add a feature gate if you think it's needed.

@adrianmoisey
Copy link
Member

I understand. Right now, if someone updates the VPA and sets a different oomBumpUpRatio than the global setting, and then later downgrades, it will go back to the global value.
I can add a feature gate if you think it's needed.

Yeah, the downgrade isn't a big problem with this change.
I think the big question is if we want to align with what Kubernetes does, or if we're happy with not supporting a super clean downgrade.

I think the VPA is a lower risk than Kubernetes, so we could get away not worrying about a downgrade, but, someone may expect the same from the VPA as they do from Kubernetes. So I'm on the fence with this one.
I've added an item to the sig-autoscaling meeting (for the 12th May) to discuss it

Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
@omerap12
Copy link
Member Author

@adrianmoisey PTAL :)

@adrianmoisey
Copy link
Member

/lgtm
/hold

I'd really love more eyes on this, hence holding it

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 28, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 28, 2025
@omerap12
Copy link
Member Author

omerap12 commented Jul 1, 2025

cc @kubernetes/sig-autoscaling-leads
For more context: #7650

Extend `ContainerResourcePolicy` with:
* `oomBumpUpRatio`
* `oomMinBumpUp`
* `memoryAggregationInterval`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you want to expose this and not also expose the count?

Do we just assume that 8 intervals is good and we can just adjust the time interval instead to suit the need?

I'm curious how much success you've had in tweaking this in your own deployment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no specific reason — the list above just includes what I thought were the most useful fields to expose. I can also expose the count field since it seems straightforward. We'll continue exposing new fields based on user requirements, as mentioned here: https://github.com/kubernetes/autoscaler/pull/8026/files#diff-b666c5599d02971edb0c08090acb86761b7c726c97c4c6c1bbf00ff5883a2364R190

@raywainman
Copy link
Member

Overall LGTM, just a couple comments. Apologies for the delay here team...

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 11, 2025
@omerap12 omerap12 requested a review from raywainman July 12, 2025 19:54
@adrianmoisey
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 14, 2025
@raywainman
Copy link
Member

/lgtm

Modulo the discussion on Quantity vs String for the OomBumpUpRatio which can be figured out during implementation.

Signed-off-by: Omer Aplatony <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 15, 2025
@adrianmoisey
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 15, 2025
@omerap12
Copy link
Member Author

Ping @raywainman for final approval.

@raywainman
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omerap12, raywainman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 18, 2025
@omerap12
Copy link
Member Author

/unhold
Thanks folks!

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit c187e7f into kubernetes:master Jul 18, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants