Skip to content

Conversation

damikag
Copy link
Member

@damikag damikag commented Mar 31, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Update scale scaleDownInCooldown definition to skip considering zero candidates as a reason to be in scaleDownInCooldown state
Emit scale down metric even when there is no scale down candidates.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- New `ScaleDownNoCandidates` status emitted instead of existing `ScaleDownInCooldown` when there are no candidates.
- `last_activity{activity=scaleDown}` metric will be updated even when there are no candidates.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/cluster-autoscaler needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 31, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @damikag. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from feiskyer and x13n March 31, 2025 14:46
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 31, 2025
Update scale scaleDownInCooldown definition to skip considering zero candidates as a reason to be in scaleDownInCooldown state
@damikag damikag force-pushed the scale-down-slo-update-metric branch from 87cf274 to 49b271f Compare March 31, 2025 14:46
@jackfrancis
Copy link
Contributor

@damikag this is a significant change to the existing behavior, can you describe why we would want to make this breaking behavioral change? Normally we wouldn't make a change like this without enabling it behind a feature/config flag of some kind

cc @towca @BigDarkClown

@damikag
Copy link
Member Author

damikag commented Apr 7, 2025

My intention is to improve the autoscaler emitting metrics when it come to scale down. At present autoscaler considers not having scale down candidates as scale down in cool down. This make us hard to know all the timestamps that autoscaler is considering scale down. This PR only adds additional metric emission when there is no scale down candidates. Apart from that,the behavior is similar to the previous case. Please correct me if I missed something @jackfrancis

@x13n
Copy link
Member

x13n commented Apr 11, 2025

I'm not sure if this requires a feature flag, but I'd be clear in the release note what is changing:

  • New ScaleDownNoCandidates status emitted instead of existing ScaleDownInCooldown when there are no candidates.
  • last_activity{activity=scaleDown} metric will be updated even when there are no candidates.

@jackfrancis - do you see this as a breaking change?

@jackfrancis
Copy link
Contributor

@x13n @damikag the fact that the existing "scale down is in cooldown" status determination includes an explicit evaluation of "scale down candidates = 0" suggests that there is a reason for that being the case.

Totally reasonable to change that but I'd want to understand why it was set to that in the first place.

It appears that a thread between @x13n and @vadasambar may provide a clue:

@x13n
Copy link
Member

x13n commented Apr 14, 2025

Ah, good research, I forgot this discussion. So it looks like this was added as an flyby optimization, but:

  • This PR still prevents the case when there are no candidates from triggering scale down actuation logic
  • It is unclear how much of an actual benefit the optimization brought in the first place (maybe @vadasambar remembers?)

@jackfrancis
Copy link
Contributor

  • This PR still prevents the case when there are no candidates from triggering scale down actuation logic

After further review I'm convinced

/hold for @x13n approval
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damikag, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2025
@jackfrancis
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 14, 2025
@x13n
Copy link
Member

x13n commented Apr 14, 2025

Thanks! I'm ok with the PR, so removing the hold.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2025
@k8s-ci-robot k8s-ci-robot merged commit 18f10c1 into kubernetes:master Apr 14, 2025
7 checks passed
@jackfrancis
Copy link
Contributor

/cherry-pick cluster-autoscaler-release-1.32

@k8s-infra-cherrypick-robot

@jackfrancis: new pull request created: #8105

In response to this:

/cherry-pick cluster-autoscaler-release-1.32

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants