-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Async node group creation #7103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async node group creation #7103
Conversation
Welcome @abdelrahman882! |
Hi @abdelrahman882. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
Hi @abdelrahman882 |
To check EasyCLA /easycla |
976ae7a
to
126ab98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! Left a couple of comments.
) | ||
|
||
// AsyncNodeGroupStateCheker is responsible for checking the state of a node group | ||
type AsyncNodeGroupStateCheker interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "AsyncNodeGroupStateCheker" -> "AsyncNodeGroupStateChecker"
type AsyncNodeGroupStateCheker interface { | ||
// IsUpcoming checks if the node group is being asynchronously created, is scheduled to be | ||
// asynchronously created or is being initiated after asynchronous creation. Upcoming node groups | ||
// are reported as non-existing by the Exist method and are listed by the cloud provider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this line to:
"are reported as non-existing by the NodeGroup.Exist method, but are listed by the cloud provider."
CreateNodeGroup(context *context.AutoscalingContext, nodeGroup cloudprovider.NodeGroup) (CreateNodeGroupResult, errors.AutoscalerError) | ||
|
||
// CreateNodeGroupAsync similar to CreateNodeGroup metho but creates node group asynchronously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "metho" -> "method"
@@ -114,6 +118,7 @@ func (ap *AutoscalingProcessors) CleanUp() { | |||
ap.ScaleDownStatusProcessor.CleanUp() | |||
ap.AutoscalingStatusProcessor.CleanUp() | |||
ap.NodeGroupManager.CleanUp() | |||
ap.AsyncNodeGroupStateChecker.CleanUp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any real usage for AsyncNodeGroupStateChecker.CleanUp()
?
@@ -254,12 +254,33 @@ func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, auto | |||
} | |||
} | |||
|
|||
// BuildUpcomingNodeGroup returns an upcoming test node group. | |||
func (tcp *TestCloudProvider) BuildUpcomingNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions) *TestNodeGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This method creates a non-existent NodeGroup. A node group wich returns false from Exist()
does not guarantee it's an upcoming node group, it could also be created for simulations. TLDR; I would just another parameter to BuildNodeGroup
(from what I see it's used 4 places) or rename the method to BuildNonExistentNodeGroup
.
@@ -332,7 +333,7 @@ func (csr *ClusterStateRegistry) registerFailedScaleUpNoLock(nodeGroup cloudprov | |||
} | |||
|
|||
// UpdateNodes updates the state of the nodes in the ClusterStateRegistry and recalculates the stats | |||
func (csr *ClusterStateRegistry) UpdateNodes(nodes []*apiv1.Node, nodeInfosForGroups map[string]*schedulerframework.NodeInfo, currentTime time.Time) error { | |||
func (csr *ClusterStateRegistry) UpdateNodes(nodes []*apiv1.Node, nodeInfosForGroups map[string]*schedulerframework.NodeInfo, currentTime time.Time, asyncNodeGroupStateCheker asyncnodegroups.AsyncNodeGroupStateCheker) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you inject AsyncNodeGroupStateChecker via NewClusterStateRegistry? This way the API would be clearer. There should be one checker during the CA instance lifetime.
126ab98
to
f9d508e
Compare
} | ||
|
||
// New returns new instance of scale up executor. | ||
func newScaleUpExecutor( | ||
autoscalingContext *context.AutoscalingContext, | ||
scaleStateNotifier nodegroupchange.NodeGroupChangeObserver, | ||
asyncNodeGroupStateCheker asyncnodegroups.AsyncNodeGroupStateChecker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late update, but I found one more typo: asyncNodeGroupStateCheker
-> asyncNodeGroupStateChecker
f9d508e
to
c5a26e9
Compare
@abdelrahman882 thanks for the change |
@pmendelski: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -254,12 +254,33 @@ func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, auto | |||
} | |||
} | |||
|
|||
// BuildUpcomingNodeGroup returns an upcoming test node group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference from BuildNodeGroup
above seems to be the ability of setting exists
to false
. Given there's already NewNodeGroup
and NewNodeGroupWithId
all doing almost the same thing, I think it would greatly improve the extensibility if we settled on a single function to create a TestNodeGroup
, leveraging option pattern to slip specific fields when needed in specific tests. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, will modify it to use one of the existing functions
@@ -0,0 +1,52 @@ | |||
/* | |||
Copyright 2018 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2024?
fakeLogRecorder, | ||
newBackoff(), | ||
nodegroupconfig.NewDefaultNodeGroupConfigProcessor(config.NodeGroupAutoscalingOptions{MaxNodeProvisionTime: 15 * time.Minute}), | ||
asyncnodegroups.NewDefaultAsyncNodeGroupStateChecker(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If NewDefaultAsyncNodeGroupStateChecker()
is a no-op checker that always returns false
on IsUpcoming()
function call, what is this actually testing? Wouldn't the result be identical if AddNodeGroup
was called instead of AddUpcomingNodeGroup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There 2 cases in clusterstate.GetUpcomingNodes()
,
- updating
upcomingCounts
ifcsr.asyncNodeGroupStateChecker
returns true for the nodegroup - updating
upcomingCounts
based oncsr.perNodeGroupReadiness
with which equalscsr.acceptableRanges[id].CurrentTarget - (len(readiness.Ready) + len(readiness.Unready) + len(readiness.LongUnregistered))
this is testing second case as we add upcoming node group ng
at the beginning so we expect it to be upcoming even asyncNodeGroupStateChecker
is returning false always
But anyways the other path indeed needs to be tested, will add node groups to test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering your question, AddUpcomingNodeGroup
is setting exists
field to false
which results in calculating upcomingCounts
correctly using 2nd method, however if we used AddNodeGroup
the exists
will be true
then 2nd method won't work so I will be mocking asyncNodeGroupStateChecker
and add more test cases to clarify it
@@ -1380,3 +1384,31 @@ func TestTruncateIfExceedMaxSize(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
func TestUpcomingNodesFromUpcomingNodeGroups(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make this test table-based & add more test cases, for instance to demonstrate different behavior for upcoming vs real node groups? Combining with TestUpcomingNodes
might make sense here.
klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err) | ||
return err | ||
} | ||
if a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can run a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup)
once per node group, no need to do it for every node.
} | ||
err := a.ClusterSnapshot.AddNodeWithPods(upcomingNode.Node(), pods) | ||
if err != nil { | ||
klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
will be logged outside of this function as well, so I'd skip logging the error here and instead of:
klog.Errorf("Failed to add upcoming node %s to cluster snapshot: %v", upcomingNode.Node().Name, err)
return err
just do
return fmt.Errorf("Failed to add upcoming node %s to cluster snapshot: %w", upcomingNode.Node().Name, err)
} | ||
if a.processors.AsyncNodeGroupStateChecker.IsUpcoming(nodeGroup) { | ||
upcomingNodesFromUpcomingNodeGroups++ | ||
if n, found := upcomingNodeGroups[nodeGroup.Id()]; found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if n, found := upcomingNodeGroups[nodeGroup.Id()]; found {
n++
upcomingNodeGroups[nodeGroup.Id()] = n
} else {
upcomingNodeGroups[nodeGroup.Id()] = 1
}
is equivalent to much simpler
upcomingNodeGroups[nodeGroup.Id()] += 1
|
||
// IsUpcoming simulates checking if node group is upcoming. | ||
func (p *MockAsyncNodeGroupStateChecker) IsUpcoming(nodeGroup cloudprovider.NodeGroup) bool { | ||
val, exists := p.IsUpcomingNodeGroup[nodeGroup.Id()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this function could just return p.IsUpcomingNodeGroup[nodeGroup.Id()]
, since the default value of bool is false
.
@@ -1664,6 +1665,110 @@ func TestScaleUpToMeetNodeGroupMinSize(t *testing.T) { | |||
assert.Equal(t, "ng1", scaleUpStatus.ScaleUpInfos[0].Group.Id()) | |||
} | |||
|
|||
func TestScaleupAsyncNodeGroupsEnabledWithNoUpcomingNodegroup(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the new tests table-based? Ideally, combine with some of the existing ones if possible.
0da80c3
to
efb8f9c
Compare
efb8f9c
to
4ebdcb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good, just one minor comment. Putting PR on hold - feel free to /hold cancel
if you disagree or don't want to make this change in this PR.
/lgtm
/approve
/hold
// BuildNodeGroup returns a test node group. | ||
func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions) *TestNodeGroup { | ||
func (tcp *TestCloudProvider) BuildNodeGroup(id string, min, max, size int, autoprovisioned bool, machineType string, opts *config.NodeGroupAutoscalingOptions, exists bool) *TestNodeGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this could be combined further with NewNodeGroup
and NewNodeGroupWithId
. All three could be combined into a single function with functional options to be set as needed (akin to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/utils/test/test_utils.go#L36). That's a bit of extra refactoring, so leaving up to you whether or not to do this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't modify NewNodeGroup
unless I modify the interface cloudprovider.CloudProvider
, so I kept it as it is and removed NewNodeGroupWithId
to have only one BuildNodeGroup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Thanks for reducing the number of methods!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abdelrahman882, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4ebdcb9
to
e30bf14
Compare
/lgtm |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
Built on top of the closed PR
Currently cluster-autoscaler is mostly synchronous. It creates node-groups and waits until creation is fully finished before scaling it up or running another scale-up iteration. Because node-group creation is a slow operation and AI workloads require scale-ups using a large number of node-groups, there is a need to introduce an asynchronous node-group creation to increase the throughput.
This PR introduces an additive api change for cloud providers to optionally implement asynchronous node-group creation.
Does this PR introduce a user-facing change?
A new "async-node-groups" experimental flag, that enables asynchronous node group creation and deletion. Before using the flag make sure that the cloud provider you're using supports async node-group operations.
/assign Daniel
/assign Pawel