Skip to content

Conversation

abdelrahman882
Copy link
Contributor

What type of PR is this?
/kind feature
/kind api-change

What this PR does / why we need it:
Built on top of the closed PR
Currently cluster-autoscaler is mostly synchronous. It creates node-groups and waits until creation is fully finished before scaling it up or running another scale-up iteration. Because node-group creation is a slow operation and AI workloads require scale-ups using a large number of node-groups, there is a need to introduce an asynchronous node-group creation to increase the throughput.

This PR introduces an additive api change for cloud providers to optionally implement asynchronous node-group creation.

Does this PR introduce a user-facing change?
A new "async-node-groups" experimental flag, that enables asynchronous node group creation and deletion. Before using the flag make sure that the cloud provider you're using supports async node-group operations.
/assign Daniel
/assign Pawel

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Aug 1, 2024
Copy link

linux-foundation-easycla bot commented Aug 1, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: pmendelski / name: Paweł Mendelski (c06ec4b)
  • ✅ login: abdelrahman882 / name: Abdelrahman Ahmed (e30bf14)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Aug 1, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @abdelrahman882!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @abdelrahman882. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 1, 2024
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@Shubham82
Copy link
Contributor

Hi @abdelrahman882
you have to sign the CLA before the PR can be reviewed.
See the following document to sign the CLA: Signing Contributor License Agreements(CLA)

@Shubham82
Copy link
Contributor

To check EasyCLA

/easycla

@abdelrahman882 abdelrahman882 force-pushed the async-node-group-creation branch from 976ae7a to 126ab98 Compare August 1, 2024 13:24
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 5, 2024
Copy link
Contributor

@pmendelski pmendelski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! Left a couple of comments.

)

// AsyncNodeGroupStateCheker is responsible for checking the state of a node group
type AsyncNodeGroupStateCheker interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "AsyncNodeGroupStateCheker" -> "AsyncNodeGroupStateChecker"

type AsyncNodeGroupStateCheker interface {
// IsUpcoming checks if the node group is being asynchronously created, is scheduled to be
// asynchronously created or is being initiated after asynchronous creation. Upcoming node groups
// are reported as non-existing by the Exist method and are listed by the cloud provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this line to:
"are reported as non-existing by the NodeGroup.Exist method, but are listed by the cloud provider."

CreateNodeGroup(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (CreateNodeGroupResult, errors.AutoscalerError)

// CreateNodeGroupAsync similar to CreateNodeGroup metho but creates node group asynchronously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "metho" -> "method"

@@ -114,6 +118,7 @@ func (ap *AutoscalingProcessors) CleanUp() {
ap.ScaleDownStatusProcessor.CleanUp()
ap.AutoscalingStatusProcessor.CleanUp()
ap.NodeGroupManager.CleanUp()
ap.AsyncNodeGroupStateChecker.CleanUp()
Copy link
Contributor

@pmendelski pmendelski Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any real usage for AsyncNodeGroupStateChecker.CleanUp()?

@@ -254,12 +254,33 @@ func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, auto
}
}

// BuildUpcomingNodeGroup returns an upcoming test node group.
func (tcp *TestCloudProvider) BuildUpcomingNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions) *TestNodeGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This method creates a non-existent NodeGroup. A node group wich returns false from Exist() does not guarantee it's an upcoming node group, it could also be created for simulations. TLDR; I would just another parameter to BuildNodeGroup (from what I see it's used 4 places) or rename the method to BuildNonExistentNodeGroup.

@@ -332,7 +333,7 @@ func (csr *ClusterStateRegistry) registerFailedScaleUpNoLock(nodeGroup cloudprov
}

// UpdateNodes updates the state of the nodes in the ClusterStateRegistry and recalculates the stats
func (csr *ClusterStateRegistry) UpdateNodes(nodes []*apiv1.Node, nodeInfosForGroups map[string]*schedulerframework.NodeInfo, currentTime time.Time) error {
func (csr *ClusterStateRegistry) UpdateNodes(nodes []*apiv1.Node, nodeInfosForGroups map[string]*schedulerframework.NodeInfo, currentTime time.Time, asyncNodeGroupStateCheker asyncnodegroups.AsyncNodeGroupStateCheker) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you inject AsyncNodeGroupStateChecker via NewClusterStateRegistry? This way the API would be clearer. There should be one checker during the CA instance lifetime.

@abdelrahman882 abdelrahman882 force-pushed the async-node-group-creation branch from 126ab98 to f9d508e Compare August 6, 2024 12:20
}

// New returns new instance of scale up executor.
func newScaleUpExecutor(
autoscalingContext *context.AutoscalingContext,
scaleStateNotifier nodegroupchange.NodeGroupChangeObserver,
asyncNodeGroupStateCheker asyncnodegroups.AsyncNodeGroupStateChecker,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late update, but I found one more typo: asyncNodeGroupStateCheker -> asyncNodeGroupStateChecker

@abdelrahman882 abdelrahman882 force-pushed the async-node-group-creation branch from f9d508e to c5a26e9 Compare August 12, 2024 08:23
@pmendelski
Copy link
Contributor

@abdelrahman882 thanks for the change
/lgtm

@k8s-ci-robot
Copy link
Contributor

@pmendelski: changing LGTM is restricted to collaborators

In response to this:

@abdelrahman882 thanks for the change
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -254,12 +254,33 @@ func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, auto
}
}

// BuildUpcomingNodeGroup returns an upcoming test node group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference from BuildNodeGroup above seems to be the ability of setting exists to false. Given there's already NewNodeGroup and NewNodeGroupWithId all doing almost the same thing, I think it would greatly improve the extensibility if we settled on a single function to create a TestNodeGroup, leveraging option pattern to slip specific fields when needed in specific tests. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will modify it to use one of the existing functions

@@ -0,0 +1,52 @@
/*
Copyright 2018 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2024?

fakeLogRecorder,
newBackoff(),
nodegroupconfig.NewDefaultNodeGroupConfigProcessor(config.NodeGroupAutoscalingOptions{MaxNodeProvisionTime: 15 * time.Minute}),
asyncnodegroups.NewDefaultAsyncNodeGroupStateChecker(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If NewDefaultAsyncNodeGroupStateChecker() is a no-op checker that always returns false on IsUpcoming() function call, what is this actually testing? Wouldn't the result be identical if AddNodeGroup was called instead of AddUpcomingNodeGroup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There 2 cases in clusterstate.GetUpcomingNodes(),

  1. updating upcomingCounts if csr.asyncNodeGroupStateChecker returns true for the nodegroup
  2. updating upcomingCounts based on csr.perNodeGroupReadiness with which equals csr.acceptableRanges[id].CurrentTarget - (len(readiness.Ready) + len(readiness.Unready) + len(readiness.LongUnregistered))

this is testing second case as we add upcoming node group ng at the beginning so we expect it to be upcoming even asyncNodeGroupStateChecker is returning false always

But anyways the other path indeed needs to be tested, will add node groups to test it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering your question, AddUpcomingNodeGroup is setting exists field to false which results in calculating upcomingCounts correctly using 2nd method, however if we used AddNodeGroup the exists will be true then 2nd method won't work so I will be mocking asyncNodeGroupStateChecker and add more test cases to clarify it

@@ -1380,3 +1384,31 @@ func TestTruncateIfExceedMaxSize(t *testing.T) {
})
}
}

func TestUpcomingNodesFromUpcomingNodeGroups(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make this test table-based & add more test cases, for instance to demonstrate different behavior for upcoming vs real node groups? Combining with TestUpcomingNodes might make sense here.

klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err)
return err
}
if a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can run a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) once per node group, no need to do it for every node.

}
err := a.ClusterSnapshot.AddNodeWithPods(upcomingNode.Node(), pods)
if err != nil {
klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err will be logged outside of this function as well, so I'd skip logging the error here and instead of:

klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err)
return err

just do

return fmt.Errorf("Failed to add upcoming node %s to cluster snapshot: %w", upcomingNode.Node().Name, err)

}
if a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) {
upcomingNodesFromUpcomingNodeGroups++
if n, found := upcomingNodeGroups[nodeGroup.Id()]; found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if n, found := upcomingNodeGroups[nodeGroup.Id()]; found {
	n++
	upcomingNodeGroups[nodeGroup.Id()] = n
} else {
	upcomingNodeGroups[nodeGroup.Id()] = 1
}

is equivalent to much simpler

upcomingNodeGroups[nodeGroup.Id()] += 1


// IsUpcoming simulates checking if node group is upcoming.
func (p *MockAsyncNodeGroupStateChecker) IsUpcoming(nodeGroup cloudprovider.NodeGroup) bool {
val, exists := p.IsUpcomingNodeGroup[nodeGroup.Id()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this function could just return p.IsUpcomingNodeGroup[nodeGroup.Id()], since the default value of bool is false.

@@ -1664,6 +1665,110 @@ func TestScaleUpToMeetNodeGroupMinSize(t *testing.T) {
assert.Equal(t, "ng1", scaleUpStatus.ScaleUpInfos[0].Group.Id())
}

func TestScaleupAsyncNodeGroupsEnabledWithNoUpcomingNodegroup(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the new tests table-based? Ideally, combine with some of the existing ones if possible.

@abdelrahman882 abdelrahman882 force-pushed the async-node-group-creation branch 2 times, most recently from 0da80c3 to efb8f9c Compare August 20, 2024 09:34
@abdelrahman882 abdelrahman882 force-pushed the async-node-group-creation branch from efb8f9c to 4ebdcb9 Compare August 20, 2024 12:08
@abdelrahman882 abdelrahman882 requested a review from x13n August 20, 2024 12:24
Copy link
Member

@x13n x13n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good, just one minor comment. Putting PR on hold - feel free to /hold cancel if you disagree or don't want to make this change in this PR.

/lgtm
/approve
/hold

// BuildNodeGroup returns a test node group.
func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions) *TestNodeGroup {
func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions, exists bool) *TestNodeGroup {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this could be combined further with NewNodeGroup and NewNodeGroupWithId. All three could be combined into a single function with functional options to be set as needed (akin to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/utils/test/test_utils.go#L36). That's a bit of extra refactoring, so leaving up to you whether or not to do this in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't modify NewNodeGroup unless I modify the interface cloudprovider.CloudProvider, so I kept it as it is and removed NewNodeGroupWithId to have only one BuildNodeGroup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Thanks for reducing the number of methods!

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abdelrahman882, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2024
@abdelrahman882 abdelrahman882 force-pushed the async-node-group-creation branch from 4ebdcb9 to e30bf14 Compare August 22, 2024 07:45
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 22, 2024
@x13n
Copy link
Member

x13n commented Aug 22, 2024

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 22, 2024
@k8s-ci-robot k8s-ci-robot merged commit d67f22b into kubernetes:master Aug 22, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants