Async node group creation #7103

abdelrahman882 · 2024-08-01T09:46:04Z

What type of PR is this?
/kind feature
/kind api-change

What this PR does / why we need it:
Built on top of the closed PR
Currently cluster-autoscaler is mostly synchronous. It creates node-groups and waits until creation is fully finished before scaling it up or running another scale-up iteration. Because node-group creation is a slow operation and AI workloads require scale-ups using a large number of node-groups, there is a need to introduce an asynchronous node-group creation to increase the throughput.

This PR introduces an additive api change for cloud providers to optionally implement asynchronous node-group creation.

Does this PR introduce a user-facing change?
A new "async-node-groups" experimental flag, that enables asynchronous node group creation and deletion. Before using the flag make sure that the cloud provider you're using supports async node-group operations.
/assign Daniel
/assign Pawel

linux-foundation-easycla · 2024-08-01T09:46:08Z

The committers listed above are authorized under a signed CLA.

✅ login: pmendelski / name: Paweł Mendelski (c06ec4b)
✅ login: abdelrahman882 / name: Abdelrahman Ahmed (e30bf14)

k8s-ci-robot · 2024-08-01T09:46:12Z

Welcome @abdelrahman882!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-08-01T09:46:13Z

Hi @abdelrahman882. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2024-08-01T09:59:32Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Shubham82 · 2024-08-01T10:32:57Z

Hi @abdelrahman882
you have to sign the CLA before the PR can be reviewed.
See the following document to sign the CLA: Signing Contributor License Agreements(CLA)

Shubham82 · 2024-08-01T10:33:21Z

To check EasyCLA

/easycla

pmendelski

Thanks for the change! Left a couple of comments.

pmendelski · 2024-08-05T10:54:45Z

cluster-autoscaler/processors/nodegroups/asyncnodegroups/asyncnodegroupstatechecker.go

+)
+
+// AsyncNodeGroupStateCheker is responsible for checking the state of a node group
+type AsyncNodeGroupStateCheker interface {


typo: "AsyncNodeGroupStateCheker" -> "AsyncNodeGroupStateChecker"

pmendelski · 2024-08-05T10:56:19Z

cluster-autoscaler/processors/nodegroups/asyncnodegroups/asyncnodegroupstatechecker.go

+type AsyncNodeGroupStateCheker interface {
+	// IsUpcoming checks if the node group is being asynchronously created, is scheduled to be
+	// asynchronously created or is being initiated after asynchronous creation. Upcoming node groups
+	// are reported as non-existing by the Exist method and are listed by the cloud provider.


Could you change this line to:
"are reported as non-existing by the NodeGroup.Exist method, but are listed by the cloud provider."

pmendelski · 2024-08-05T10:57:02Z

cluster-autoscaler/processors/nodegroups/nodegroup_manager.go

 	CreateNodeGroup(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (CreateNodeGroupResult, errors.AutoscalerError)
+
+	// CreateNodeGroupAsync similar to CreateNodeGroup metho but creates node group asynchronously.


typo: "metho" -> "method"

pmendelski · 2024-08-05T10:59:34Z

cluster-autoscaler/processors/processors.go

@@ -114,6 +118,7 @@ func (ap *AutoscalingProcessors) CleanUp() {
 	ap.ScaleDownStatusProcessor.CleanUp()
 	ap.AutoscalingStatusProcessor.CleanUp()
 	ap.NodeGroupManager.CleanUp()
+	ap.AsyncNodeGroupStateChecker.CleanUp()


Is there any real usage for AsyncNodeGroupStateChecker.CleanUp()?

pmendelski · 2024-08-05T11:48:32Z

cluster-autoscaler/cloudprovider/test/test_cloud_provider.go

@@ -254,12 +254,33 @@ func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, auto
 	}
 }

+// BuildUpcomingNodeGroup returns an upcoming test node group.
+func (tcp *TestCloudProvider) BuildUpcomingNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions) *TestNodeGroup {


nit: This method creates a non-existent NodeGroup. A node group wich returns false from Exist() does not guarantee it's an upcoming node group, it could also be created for simulations. TLDR; I would just another parameter to BuildNodeGroup (from what I see it's used 4 places) or rename the method to BuildNonExistentNodeGroup.

pmendelski · 2024-08-05T11:52:40Z

cluster-autoscaler/clusterstate/clusterstate.go

@@ -332,7 +333,7 @@ func (csr *ClusterStateRegistry) registerFailedScaleUpNoLock(nodeGroup cloudprov
 }

 // UpdateNodes updates the state of the nodes in the ClusterStateRegistry and recalculates the stats
-func (csr *ClusterStateRegistry) UpdateNodes(nodes []*apiv1.Node, nodeInfosForGroups map[string]*schedulerframework.NodeInfo, currentTime time.Time) error {
+func (csr *ClusterStateRegistry) UpdateNodes(nodes []*apiv1.Node, nodeInfosForGroups map[string]*schedulerframework.NodeInfo, currentTime time.Time, asyncNodeGroupStateCheker asyncnodegroups.AsyncNodeGroupStateCheker) error {


Could you inject AsyncNodeGroupStateChecker via NewClusterStateRegistry? This way the API would be clearer. There should be one checker during the CA instance lifetime.

pmendelski · 2024-08-12T07:54:46Z

cluster-autoscaler/core/scaleup/orchestrator/executor.go

 }

 // New returns new instance of scale up executor.
 func newScaleUpExecutor(
 	autoscalingContext *context.AutoscalingContext,
 	scaleStateNotifier nodegroupchange.NodeGroupChangeObserver,
+	asyncNodeGroupStateCheker asyncnodegroups.AsyncNodeGroupStateChecker,


sorry for the late update, but I found one more typo: asyncNodeGroupStateCheker -> asyncNodeGroupStateChecker

pmendelski · 2024-08-13T10:38:51Z

@abdelrahman882 thanks for the change
/lgtm

k8s-ci-robot · 2024-08-13T10:38:55Z

@pmendelski: changing LGTM is restricted to collaborators

In response to this:

@abdelrahman882 thanks for the change
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

x13n · 2024-08-14T13:17:31Z

cluster-autoscaler/cloudprovider/test/test_cloud_provider.go

@@ -254,12 +254,33 @@ func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, auto
 	}
 }

+// BuildUpcomingNodeGroup returns an upcoming test node group.


The only difference from BuildNodeGroup above seems to be the ability of setting exists to false. Given there's already NewNodeGroup and NewNodeGroupWithId all doing almost the same thing, I think it would greatly improve the extensibility if we settled on a single function to create a TestNodeGroup, leveraging option pattern to slip specific fields when needed in specific tests. WDYT?

I agree, will modify it to use one of the existing functions

x13n · 2024-08-14T13:46:52Z

cluster-autoscaler/processors/nodegroups/asyncnodegroups/async_nodegroup_state_checker.go

@@ -0,0 +1,52 @@
+/*
+Copyright 2018 The Kubernetes Authors.


x13n · 2024-08-14T13:49:33Z

cluster-autoscaler/clusterstate/clusterstate_test.go

+		fakeLogRecorder,
+		newBackoff(),
+		nodegroupconfig.NewDefaultNodeGroupConfigProcessor(config.NodeGroupAutoscalingOptions{MaxNodeProvisionTime: 15 * time.Minute}),
+		asyncnodegroups.NewDefaultAsyncNodeGroupStateChecker(),


If NewDefaultAsyncNodeGroupStateChecker() is a no-op checker that always returns false on IsUpcoming() function call, what is this actually testing? Wouldn't the result be identical if AddNodeGroup was called instead of AddUpcomingNodeGroup?

There 2 cases in clusterstate.GetUpcomingNodes(),

updating upcomingCounts if csr.asyncNodeGroupStateChecker returns true for the nodegroup

updating upcomingCounts based on csr.perNodeGroupReadiness with which equals csr.acceptableRanges[id].CurrentTarget - (len(readiness.Ready) + len(readiness.Unready) + len(readiness.LongUnregistered))

this is testing second case as we add upcoming node group ng at the beginning so we expect it to be upcoming even asyncNodeGroupStateChecker is returning false always

But anyways the other path indeed needs to be tested, will add node groups to test it

Answering your question, AddUpcomingNodeGroup is setting exists field to false which results in calculating upcomingCounts correctly using 2nd method, however if we used AddNodeGroup the exists will be true then 2nd method won't work so I will be mocking asyncNodeGroupStateChecker and add more test cases to clarify it

x13n · 2024-08-14T13:52:13Z

cluster-autoscaler/clusterstate/clusterstate_test.go

@@ -1380,3 +1384,31 @@ func TestTruncateIfExceedMaxSize(t *testing.T) {
 		})
 	}
 }
+
+func TestUpcomingNodesFromUpcomingNodeGroups(t *testing.T) {


Would it be possible to make this test table-based & add more test cases, for instance to demonstrate different behavior for upcoming vs real node groups? Combining with TestUpcomingNodes might make sense here.

x13n · 2024-08-16T12:16:28Z

cluster-autoscaler/core/static_autoscaler.go

+				klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err)
+				return err
+			}
+			if a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) {


nit: You can run a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) once per node group, no need to do it for every node.

x13n · 2024-08-16T12:21:22Z

cluster-autoscaler/core/static_autoscaler.go

+			}
+			err := a.ClusterSnapshot.AddNodeWithPods(upcomingNode.Node(), pods)
+			if err != nil {
+				klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err)


err will be logged outside of this function as well, so I'd skip logging the error here and instead of:

klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err) return err

just do

return fmt.Errorf("Failed to add upcoming node %s to cluster snapshot: %w", upcomingNode.Node().Name, err)

x13n · 2024-08-16T12:28:05Z

cluster-autoscaler/core/static_autoscaler.go

+			}
+			if a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) {
+				upcomingNodesFromUpcomingNodeGroups++
+				if n, found := upcomingNodeGroups[nodeGroup.Id()]; found {


if n, found := upcomingNodeGroups[nodeGroup.Id()]; found { n++ upcomingNodeGroups[nodeGroup.Id()] = n } else { upcomingNodeGroups[nodeGroup.Id()] = 1 }

is equivalent to much simpler

upcomingNodeGroups[nodeGroup.Id()] += 1

x13n · 2024-08-16T12:37:39Z

cluster-autoscaler/core/test/common.go

+
+// IsUpcoming simulates checking if node group is upcoming.
+func (p *MockAsyncNodeGroupStateChecker) IsUpcoming(nodeGroup cloudprovider.NodeGroup) bool {
+	val, exists := p.IsUpcomingNodeGroup[nodeGroup.Id()]


nit: this function could just return p.IsUpcomingNodeGroup[nodeGroup.Id()], since the default value of bool is false.

x13n · 2024-08-16T12:43:29Z

cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go

@@ -1664,6 +1665,110 @@ func TestScaleUpToMeetNodeGroupMinSize(t *testing.T) {
 	assert.Equal(t, "ng1", scaleUpStatus.ScaleUpInfos[0].Group.Id())
 }

+func TestScaleupAsyncNodeGroupsEnabledWithNoUpcomingNodegroup(t *testing.T) {


Can you make the new tests table-based? Ideally, combine with some of the existing ones if possible.

x13n

Thanks, looks good, just one minor comment. Putting PR on hold - feel free to /hold cancel if you disagree or don't want to make this change in this PR.

/lgtm
/approve
/hold

x13n · 2024-08-21T13:17:59Z

cluster-autoscaler/cloudprovider/test/test_cloud_provider.go

 // BuildNodeGroup returns a test node group.
-func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions) *TestNodeGroup {
+func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions, exists bool) *TestNodeGroup {


I meant this could be combined further with NewNodeGroup and NewNodeGroupWithId. All three could be combined into a single function with functional options to be set as needed (akin to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/utils/test/test_utils.go#L36). That's a bit of extra refactoring, so leaving up to you whether or not to do this in this PR.

Couldn't modify NewNodeGroup unless I modify the interface cloudprovider.CloudProvider, so I kept it as it is and removed NewNodeGroupWithId to have only one BuildNodeGroup

Ah, good point. Thanks for reducing the number of methods!

k8s-ci-robot · 2024-08-21T13:38:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abdelrahman882, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [x13n]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

x13n · 2024-08-22T08:47:51Z

/lgtm
/hold cancel

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Aug 1, 2024

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Aug 1, 2024

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 1, 2024

k8s-ci-robot added the area/cluster-autoscaler label Aug 1, 2024

k8s-ci-robot requested review from BigDarkClown and x13n August 1, 2024 09:46

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 1, 2024

abdelrahman882 force-pushed the async-node-group-creation branch from 976ae7a to 126ab98 Compare August 1, 2024 13:24

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 5, 2024

pmendelski reviewed Aug 5, 2024

View reviewed changes

abdelrahman882 force-pushed the async-node-group-creation branch from 126ab98 to f9d508e Compare August 6, 2024 12:20

pmendelski reviewed Aug 12, 2024

View reviewed changes

abdelrahman882 force-pushed the async-node-group-creation branch from f9d508e to c5a26e9 Compare August 12, 2024 08:23

x13n requested changes Aug 16, 2024

View reviewed changes

abdelrahman882 force-pushed the async-node-group-creation branch 2 times, most recently from 0da80c3 to efb8f9c Compare August 20, 2024 09:34

Add async node group creation

c06ec4b

abdelrahman882 force-pushed the async-node-group-creation branch from efb8f9c to 4ebdcb9 Compare August 20, 2024 12:08

abdelrahman882 requested a review from x13n August 20, 2024 12:24

x13n reviewed Aug 21, 2024

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2024

k8s-ci-robot assigned x13n Aug 21, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2024

Add upcoming node groups state checker

e30bf14

abdelrahman882 force-pushed the async-node-group-creation branch from 4ebdcb9 to e30bf14 Compare August 22, 2024 07:45

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 22, 2024

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 22, 2024

k8s-ci-robot merged commit d67f22b into kubernetes:master Aug 22, 2024
6 checks passed

abdelrahman882 mentioned this pull request Apr 13, 2025

REQUEST: New membership for abdelrahman882 kubernetes/org#5534

Closed

11 tasks

		CreateNodeGroup(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (CreateNodeGroupResult, errors.AutoscalerError)

		// CreateNodeGroupAsync similar to CreateNodeGroup metho but creates node group asynchronously.

Async node group creation #7103

Async node group creation #7103

Uh oh!

Conversation

abdelrahman882 commented Aug 1, 2024

Uh oh!

linux-foundation-easycla bot commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Aug 1, 2024

Uh oh!

k8s-ci-robot commented Aug 1, 2024

Uh oh!

k8s-triage-robot commented Aug 1, 2024

Uh oh!

Shubham82 commented Aug 1, 2024

Uh oh!

Shubham82 commented Aug 1, 2024

Uh oh!

pmendelski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmendelski Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmendelski commented Aug 13, 2024

Uh oh!

k8s-ci-robot commented Aug 13, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

x13n left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Aug 21, 2024

Uh oh!

x13n commented Aug 22, 2024

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Aug 1, 2024 •

edited

Loading

pmendelski Aug 5, 2024 •

edited

Loading