-
Notifications
You must be signed in to change notification settings - Fork 4.2k
feat(recommender): add OOMMinBumpUp&OOMBumpUpRatio to CRD #8012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
We might want to create a proper AEP for this, but this is the general direction I'm thinking. I can open additional issues to track the specific flags we’d like to support for this type of configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @omerap12 thanks for the PR!
I agree it makes sense to be able to configure the OOM bump behavior on VPA level. There's a few questions on how to implement this, though:
- I'm not sure if we want this to be a configuration on Container level or on Pod level, i.e. should this apply to all Containers controlled by a certain VPA or should this rather be something that's controlled per individual Container? I think so far we've been mostly offering configurations on Container level, probably this would also apply here. Or do we have some indication that people who want to configure custom OOM bumps want to do this for all Containers of a Pod in the same way?
- I don't think we should introduce a new configuration type
recommenderConfig
. Technically, all of these properties are configuration options of the recommender (histogram decay options, maxAllowed, minAllowed, which resources to include in the recommendations, etc), so this doesn't seem like a reasonable way to group things. If we agree to make this configuration Container specific, I'd rather add this to theContainerResourcePolicy
- Currently, OOM bump configuration is part of the
AggregationsConfig
, as it is assumed to be globally configured, like all the other options in there. This config is only initialized once, in themain.go
:
model.InitializeAggregationsConfig(model.NewAggregationsConfig(*memoryAggregationInterval, *memoryAggregationIntervalCount, *memoryHistogramDecayHalfLife, *cpuHistogramDecayHalfLife, *oomBumpUpRatio, *oomMinBumpUp)) - If we, however, want to make this configurable per VPA, I'd rather opt for pushing this configuration down, rather than adding some if-else to the cluster_feeder resulting in having to find the correct VPA for a Pod every time we add an OOM sample
- IMHO, a possible place to put this configuration options would be the
aggregate_container_state
, where we already have the necessary methods to re-load the ContainerResourcePolicy options on VPA updates and then read this incluster.go
, right before we add the OOM sample to the ContainerAggregation:
err := containerState.RecordOOM(timestamp, requestedMemory)
WDYT?
Thanks for the input!
So yep, I agree with all of your suggestions :) |
5e23c1d
to
b7b84de
Compare
Signed-off-by: Omer Aplatony <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: omerap12 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/remove area provider/cluster-api |
/remove-area provider/cluster-api |
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
/label tide/merge-method-squash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the tests yet, but here are a few small comments
vertical-pod-autoscaler/pkg/admission-controller/resource/vpa/handler.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/admission-controller/resource/vpa/handler.go
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/pkg/recommender/model/aggregate_container_state.go
Show resolved
Hide resolved
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
vertical-pod-autoscaler/pkg/admission-controller/resource/vpa/handler.go
Outdated
Show resolved
Hide resolved
Co-authored-by: Adrian Moisey <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
/label api-review |
Signed-off-by: Omer Aplatony <[email protected]>
// check that perVPA is on if being used | ||
if err := validatePerVPAFeatureFlag(&policy); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code made me wonder if we should try move closer to how k/k validates resources. That's something for another day though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.. we should create a seperate issue to discuss it.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds new options to the Vertical Pod Autoscaler (VPA) to better handle Out of Memory (OOM) events:
It adds two new settings to the VPA configuration:
These settings can be set for each container within a VPA's resource policy.
If not set for a specific container, it will use default values from the VPA recommender.
Example:
Which issue(s) this PR fixes:
part of #7650
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: