Skip to content

Commit a28043c

Browse files
committed
Bump to beta
Signed-off-by: kerthcet <[email protected]>
1 parent 3dee68d commit a28043c

File tree

3 files changed

+96
-35
lines changed

3 files changed

+96
-35
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 961
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Lines changed: 65 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# KEP-961: Implement maxUnavailable in StatefulSet
22

3-
43
<!--
54
This is the title of your KEP. Keep it short, simple, and descriptive. A good
65
title can help communicate what the KEP is and should be considered as part of
@@ -19,23 +18,25 @@ tags, and then generate with `hack/update-toc.sh`.
1918

2019
<!-- toc -->
2120
- [Release Signoff Checklist](#release-signoff-checklist)
21+
- [Table of Contents](#table-of-contents)
2222
- [Summary](#summary)
2323
- [Motivation](#motivation)
2424
- [Goals](#goals)
2525
- [Non-Goals](#non-goals)
2626
- [Proposal](#proposal)
27-
- [User Stories (Optional)](#user-stories-optional)
27+
- [User Stories](#user-stories)
2828
- [Story 1](#story-1)
29-
- [Story 2](#story-2)
3029
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
3130
- [Risks and Mitigations](#risks-and-mitigations)
3231
- [Design Details](#design-details)
32+
- [Implementation Details](#implementation-details)
33+
- [API Changes](#api-changes)
34+
- [Implementation](#implementation)
3335
- [Test Plan](#test-plan)
3436
- [Prerequisite testing updates](#prerequisite-testing-updates)
35-
- [Unit tests](#unit-tests)
36-
- [Integration tests](#integration-tests)
37-
- [e2e tests](#e2e-tests)
38-
- [Graduation Criteria](#graduation-criteria)
37+
- [Tests](#tests)
38+
- [Test Plan](#test-plan-1)
39+
- [Graduation Criteria](#graduation-criteria)
3940
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
4041
- [Version Skew Strategy](#version-skew-strategy)
4142
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -206,7 +207,7 @@ What is out of scope for this KEP? Listing non-goals helps to focus discussion
206207
and make progress.
207208
-->
208209

209-
N/A
210+
None.
210211

211212
## Proposal
212213

@@ -458,9 +459,12 @@ No.
458459
- maxUnavailable greater than 1 with partition and maxUnavailable greater than replicas
459460

460461
#### Test Plan
461-
For `Alpha`, unit tests and e2e tests will be added to test functionality at both
462+
463+
For `Alpha`, unit tests and integration tests will be added to test functionality at both
462464
with feature flag enabled and disabled. Defaults will be verified so that users
463-
who donot set this flag are not surprised at all.
465+
who do not set this flag are not surprised at all.
466+
467+
For `Beta`, add e2e tests.
464468

465469
## Graduation Criteria
466470

@@ -604,11 +608,16 @@ maxUnavailable to a number greater than 1, but the invariants and the logic wil
604608
maxUnavailable pods with the same identity and never more than maxUnavailable being deleted.
605609

606610
###### What specific metrics should inform a rollback?
607-
TODO when we reach Beta
611+
612+
When feature enabled but rolling update in a unexpected phenomenon like the update pods at a time is not equal to the
613+
`maxUnavailable` value or rolling update in a unexpected order.
614+
615+
Or we can refer to the `rolling-update-duration` metric for observation, if it didn't decrease when setting the `maxUnavailable`
616+
great than 1 or the duration increased abnormally, then we should rollback.
608617

609618
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
610-
Will be tested when graduating to Beta.
611619

620+
No, but it will be tested manually before merging the PR.
612621

613622
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
614623
No
@@ -623,32 +632,43 @@ The below command should show maxUnavailable value:
623632
kubectl get statefulsets -o yaml | grep maxUnavailable
624633
```
625634

635+
Or refer to the new metric `rolling-update-duration`, it should exist.
636+
626637
###### How can someone using this feature know that it is working for their instance?
627-
TODO when we reach Beta
638+
639+
With feature enabled, set the `maxUnavailable` great than 1, and pay attention to the rolling update pods at a time,
640+
it should equal to the `maxUnavailable`.
641+
Or when setting the `maxUnavailable` great than 1, the `rolling-update-duration` should decrease.
628642

629643
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
630644

645+
I think it has little relevance with SLOs, but rolling update at a very low speed which impacts the running services.
646+
631647
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
632648

649+
None.
650+
633651
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
634652

653+
None.
654+
635655
### Dependencies
636656

637657
###### Does this feature depend on any specific services running in the cluster?
638-
NA
658+
No.
639659

640660
### Scalability
641661

642662
###### Will enabling / using this feature result in any new API calls?
643-
It doesnt make any extra API calls.
663+
664+
It doesn't make any extra API calls.
644665

645666
###### Will enabling / using this feature result in introducing new API types?
646667
No
647668

648669
###### Will enabling / using this feature result in any new calls to the cloud provider?
649670
No
650671

651-
652672
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
653673
A struct gets added to every StatefulSet object which has three fields, one 32 bit integer and two fields of type string.
654674
The struct in question is IntOrString.
@@ -661,25 +681,49 @@ The controller-manager will see very negligible and almost un-notoceable increas
661681

662682
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
663683

684+
No.
685+
664686
### Troubleshooting
665687

666688
###### How does this feature react if the API server and/or etcd is unavailable?
667689
The RollingUpdate will fail or will not be able to proceed if etcd or apiserver is unavailable and
668690
hence this feature will also be not be able to be used.
669691

670692
###### What are other known failure modes?
671-
NA
693+
694+
<!--
695+
For each of them, fill in the following information by copying the below template:
696+
- [Failure mode brief description]
697+
- Detection: How can it be detected via metrics? Stated another way:
698+
how can an operator troubleshoot without logging into a master or worker node?
699+
- Mitigations: What can be done to stop the bleeding, especially for already
700+
running user workloads?
701+
- Diagnostics: What are the useful log messages and their required logging
702+
levels that could help debug the issue?
703+
Not required until feature graduated to beta.
704+
- Testing: Are there any tests for failure mode? If not, describe why.
705+
-->
706+
707+
In a multi-master setup, when the cluster has skewed CCM, the behaviors may different.
708+
709+
- [Failure mode brief description]
710+
- Detection: the `rolling-update-duration` didn't decrease when setting the `maxUnavailable` great than 1 or increased abnormally.
711+
- Mitigations: Disable the feature.
712+
- Diagnostics: Set the logger level great than 4.
713+
- Testing: No testing, because the rolling update duration is hard to measure, it can be impact by a lot of things,
714+
like the master performance.
672715

673716
###### What steps should be taken if SLOs are not being met to determine the problem?
674717

675718
## Implementation History
676719

677720
- KEP Started on 1/1/2019
678721
- Implementation PR and UT by 8/30
722+
- Bump to beta at 2023-05-11
679723

680724
## Drawbacks
681725

682-
NA
726+
None.
683727

684728
## Alternatives
685729

@@ -689,4 +733,6 @@ section.
689733
- Another alternative would be to use OnDelete and deploy your own Custom Controller on top of StatefulSet Pods. There you can implement
690734
your own logic for deleting more than one pods in a specific order. This requires more work on the user but give them ultimate flexibility.
691735

692-
## Infrastructure Needed (Optional)
736+
## Infrastructure Needed (Optional)
737+
738+
No.

keps/sig-apps/961-maxunavailable-for-statefulset/kep.yaml

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,30 +2,43 @@ title: Implement maxUnavailable for StatefulSets
22
kep-number: 961
33
authors:
44
- "@krmayankk"
5+
- "@kerthcet"
56
owning-sig: sig-apps
6-
participating-sigs:
7-
- sig-apps
7+
participating-sigs: []
8+
status: implementable
9+
creation-date: 2018-12-29
810
reviewers:
911
- "@janetkuo"
1012
- "@kow3ns"
1113
approvers:
1214
- "@janetkuo"
1315
- "@kow3ns"
14-
editor: TBD
15-
creation-date: 2018-12-29
16-
last-updated: 2022-01-01
17-
status: implementable
18-
see-also:
19-
- n/a
20-
replaces:
21-
- n/a
22-
superseded-by:
23-
- n/a
16+
17+
see-also: []
18+
replaces: []
19+
20+
# The target maturity stage in the current dev cycle for this KEP.
21+
stage: beta
22+
23+
# The most recent milestone for which work toward delivery of this KEP has been
24+
# done. This can be the current (upcoming) milestone, if it is being actively
25+
# worked on.
26+
latest-milestone: "v1.28"
2427

2528
# The milestone at which this feature was, or is targeted to be, at each stage.
2629
milestone:
2730
alpha: "v1.24"
28-
beta: "v1.25"
29-
stable: "v1.26"
30-
latest-milestone: "v1.24"
31-
stage: "alpha"
31+
beta: "v1.28"
32+
stable: TBD
33+
34+
# The following PRR answers are required at alpha release
35+
# List the feature gate name and the components for which it must be enabled
36+
feature-gates:
37+
- name: MaxUnavailableStatefulSet
38+
components:
39+
- kube-controller-manager
40+
- kube-apiserver
41+
42+
# The following PRR answers are required at beta release
43+
metrics:
44+
- rolling-update-duration

0 commit comments

Comments
 (0)