Skip to content

Commit e9eae01

Browse files
committed
address comments
Signed-off-by: kerthcet <[email protected]>
1 parent 8e5ecaa commit e9eae01

File tree

2 files changed

+120
-45
lines changed

2 files changed

+120
-45
lines changed

keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Lines changed: 118 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -79,16 +79,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
7979
- [x] (R) KEP approvers have approved the KEP status as `implementable`
8080
- [x] (R) Design details are appropriately documented
8181
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
82-
- [ ] e2e Tests for all Beta API Operations (endpoints)
82+
- [x] e2e Tests for all Beta API Operations (endpoints)
8383
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
8484
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
8585
- [ ] (R) Graduation criteria is in place
8686
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
87-
- [ ] (R) Production readiness review completed
87+
- [x] (R) Production readiness review completed
8888
- [ ] (R) Production readiness review approved
8989
- [x] "Implementation History" section is up-to-date for milestone
90-
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
91-
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
90+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
91+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
9292

9393
<!--
9494
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -235,6 +235,7 @@ bogged down.
235235
-->
236236

237237
#### Story 1
238+
238239
As a User of Kubernetes, I should be able to update my StatefulSet, more than one Pod at a time, in a
239240
RollingUpdate manner, if my Stateful app can tolerate more than one pod being down, thus allowing my
240241
update to finish much faster.
@@ -248,7 +249,9 @@ Go in to as much detail as necessary here.
248249
This might be a good place to talk about core concepts and how they relate.
249250
-->
250251

251-
No.
252+
With `maxUnavailable` feature enabled, we'll bring down more than one pod at a time, if your application can't tolerate
253+
this behavior, you should absolutely disable this feature or leave the field unconfigured, which will behave as the default
254+
behavior.
252255

253256
### Risks and Mitigations
254257

@@ -425,6 +428,53 @@ https://github.com/kubernetes/kubernetes/blob/v1.13.0/pkg/controller/statefulset
425428
...
426429
```
427430

431+
#### Metrics
432+
433+
We'll add a new metric named `rolling-update-duration-seconds`, it tracks how long a statefulset takes to finish a rolling-update,
434+
the general logic is:
435+
436+
when [performing update](https://github.com/kubernetes/kubernetes/blob/c984d53b31655924b87a57bfd4d8ff90aaeab9f8/pkg/controller/statefulset/stateful_set_control.go#L97-L138),
437+
if the `currentRevision` != `updateRevision`, then we take it as starting to rolling update, but because rolling update can not
438+
finish in one reconciling loop, then we may have to track it, we have two approaches:
439+
440+
- One is add a new field in the `defaultStatefulSetControl`, like below:
441+
442+
```golang
443+
type defaultStatefulSetControl struct {
444+
podControl *StatefulPodControl
445+
statusUpdater StatefulSetStatusUpdaterInterface
446+
controllerHistory history.Interface
447+
recorder record.EventRecorder
448+
449+
// <Newly Added>
450+
// rollingUpdateStartTimes records when each statefulset starts to perform rolling update.
451+
// key is the statefulset name, value is the start time.
452+
rollingUpdateStartTimes map[string]time.Time
453+
}
454+
```
455+
456+
Then when `currentRevision` != `updateRevision`, we'll check whether the statefulset name
457+
exists in the `rollingUpdateStartTimes`, if exists, skip, if not, write the
458+
start time.
459+
460+
If we found rolling update completed in [updateStatefulSetStatus](https://github.com/kubernetes/kubernetes/blob/c984d53b31655924b87a57bfd4d8ff90aaeab9f8/pkg/controller/statefulset/stateful_set_control.go#L682-L701),
461+
then we'll report the metrics here and clear the statefulset from the `rollingUpdateStartTimes`.
462+
463+
- Another approach is add a new field to `StatefulSetStatus` as below:
464+
465+
```golang
466+
type StatefulSetStatus struct {
467+
468+
// rollingUpdateStartTime will record when the statefulSet starts to rolling update.
469+
rollingUpdateStartTime *time.Time
470+
}
471+
```
472+
473+
The logic general looks the same, we'll update the time together with status update, so there's no
474+
extra api calls.
475+
476+
Prefer Opt-2 as we already record the rolling update messages in the status.
477+
428478
### Test Plan
429479

430480
<!--
@@ -482,6 +532,11 @@ Testcases:
482532
- maxUnavailable greater than 1 with partition and staged pods greater than maxUnavailable
483533
- maxUnavailable greater than 1 with partition and maxUnavailable greater than replicas
484534

535+
New testcases being added:
536+
537+
- New metric `rolling-update-duration-seconds` should calculate time correctly.
538+
- Feature enablement/disablement test
539+
485540
Coverage:
486541

487542
- `pkg/apis/apps/v1`: `2023-05-26` - `71.7%`
@@ -500,9 +555,9 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
500555
https://storage.googleapis.com/k8s-triage/index.html
501556
-->
502557

503-
We have controller tests in `pkg/controller/statefulset` cover all cases. What's more,
504-
rolling update is hard to simulate in integration tests because we want to verify the pod
505-
status but actually they're faked. But we'll have e2e tests instead.
558+
Controller tests in `pkg/controller/statefulset` cover all cases. Since pod statuses
559+
are faked in integration tests, simulating rolling update is difficult. We're opting
560+
for e2e tests instead.
506561

507562
##### e2e tests
508563

@@ -590,6 +645,8 @@ in back-to-back releases.
590645
#### Beta
591646

592647
- Enabled by default with default value of 1 with upgrade downgrade tested at least manually.
648+
- Additional e2e tests listed in the test plan should be added.
649+
- No critical bugs reported and no bad feedbacks collected.
593650

594651
### Upgrade / Downgrade Strategy
595652

@@ -609,19 +666,19 @@ enhancement:
609666

610667
- What changes (in invocations, configurations, API use, etc.) is an existing
611668
cluster required to make on upgrade, in order to maintain previous behavior?
612-
- Nothing special, it's backward compatibility
669+
- This is a new field, it will only be enabled after an upgrade. To maintain the
670+
previous behavior, you can disable the feature gate manually in Beta or leave
671+
the field unconfigured.
613672
- What changes (in invocations, configurations, API use, etc.) is an existing
614673
cluster required to make on upgrade, in order to make use of the enhancement?
615-
- This feature is enabled by default in beta so you can configure the maxUnavailable to meet your requirements
674+
- This feature is enabled by default in Beta so you can configure the maxUnavailable
675+
to meet your requirements
616676

617677
#### Downgrade
618678

619-
when downgrade from a release `maxUnavailable` enabled to a release it's disabled:
620-
621-
- If we downgrade to a release even doesn't have this field, then there's no way to discover it.
622-
- Else, if we create statefulset with maxUnavailable configured, it will be dropped to nil automatically.
623-
624-
Anyway, we'll back to that time rolling update with a single unavailable pod.
679+
- If we try to set this field, it will be silently dropped.
680+
- If the object with this field already exists, we maintain is as it is, but the controller
681+
will just ignore this field.
625682

626683
### Version Skew Strategy
627684

@@ -638,10 +695,13 @@ enhancement:
638695
CRI or CNI may require updating that component before the kubelet.
639696
-->
640697

641-
- If both api-server and kube-controller-manager enabled this feature, then everything works as expected
642-
- If only api-server enabled this feature, then we can configure the `maxUnavailable` field but it will not work.
643-
- If only kube-controller-manager enabled this feature, then for newly created statefulSet, the `maxUnavailable` will default to 1.
644-
- If both api-server and kube-controller-manager disabled this feature, then everything works as expected.
698+
- If both api-server and kube-controller-manager enabled this feature, then `maxUnavailable` will take effect.
699+
- If only api-server enabled this feature, because statefulset controller is unaware of this feature, then we'll
700+
fall into the default rolling-update behavior, one pod at a time.
701+
- If only kube-controller-manager enabled this feature, then this field is invisible to users, we'll keep the default
702+
rolling-update behavior, one pod at a time.
703+
- If both api-server and kube-controller-manager disabled this feature, then only one Pod will be updated
704+
at a time as the default behavior.
645705

646706
## Production Readiness Review Questionnaire
647707

@@ -659,13 +719,12 @@ No, the default behavior remains the same.
659719

660720
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
661721

662-
Yes this feature can be disabled. Once disabled, all existing StatefulSet will
663-
revert to the old behavior where rolling update will proceed one pod at a time.
722+
Yes, you can disable the feature-gate manually once it's in Beta.
664723

665724
###### What happens if we reenable the feature if it was previously rolled back?
666725

667-
We will restore the desired behavior for StatefulSets for which the maxUnavailable field wasn't deleted after
668-
the feature gate was disabled.
726+
If we disable the feature-gate and reenable it again, then the `maxUnavailable` will take effect again,
727+
that's say we'll restart to respect the value of `maxUnavailable` when rolling update.
669728

670729
###### Are there any tests for feature enablement/disablement?
671730

@@ -682,9 +741,11 @@ You can take a look at one potential example of such test in:
682741
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
683742
-->
684743

685-
Yes, there are unit tests which make sure the field is correctly dropped
744+
There are unit tests which make sure the field is correctly dropped
686745
on feature enable and disabled, see [strategy tests](https://github.com/kubernetes/kubernetes/blob/23698d3e9f4f3b9738ba5a6fcefd17894a00624f/pkg/registry/apps/statefulset/strategy_test.go#L391-L417).
687746

747+
Feature enablement/disablement test will also be added when graduating to Beta as [TestStatefulSetStartOrdinalEnablement](https://github.com/kubernetes/kubernetes/blob/23698d3e9f4f3b9738ba5a6fcefd17894a00624f/pkg/registry/apps/statefulset/strategy_test.go#L473)
748+
688749
### Rollout, Upgrade and Rollback Planning
689750

690751
###### How can a rollout or rollback fail? Can it impact already running workloads?
@@ -699,11 +760,15 @@ rollout. Similarly, consider large clusters and how enablement/disablement
699760
will rollout across nodes.
700761
-->
701762

702-
This could happen when we downgrade the kube-controller-manager from maxUnavailable-enabled release to a disabled one,
703-
but it will not impact running workloads, because `maxUnavailable` only works in rolling update.
763+
It depends, like
764+
765+
- In a HA cluster, if part of the API servers enabled with this feature gate but the others not, then
766+
for already running workloads, they will not be impacted in rolling update because the process is controlled
767+
by the statefulset controller. But for newly created statefulset, their `maxUnavailable` field will
768+
be dropped if requested a feature-gate disabled api-server.
704769

705-
But if a statefulset has a plenty of replicas, when rolling update, it will take more time comparing to
706-
`maxUnavailable` enabled with a number greater than 1.
770+
- With this feature enabled and a statefulset running into the rolling-update, then we disabled the feature-gate,
771+
then this statefulset will be impacted as it will fall into the default behavior, updating a pod at one time.
707772

708773
###### What specific metrics should inform a rollback?
709774

@@ -712,33 +777,41 @@ What signals should users be paying attention to when the feature is young
712777
that might indicate a serious problem?
713778
-->
714779

715-
When `maxUnavailable` enabled, and we're rolling update a statefulset, the number of pods brought down should
780+
- As a normal user, with `maxUnavailable` enabled, and we're rolling update a statefulset, the number of pods brought down should
716781
equal to the `maxUnavailable`, if not, we should rollback.
717782

783+
- As a administrator(as well for normal user), I can check the new metric `rolling-update-duration-seconds`, if we enabled the feature with
784+
greater than 1 `maxUnavailable` set for statefulsets, but didn't see the duration fall down, then we may have some problems here.
785+
This is not precise because rolling-update duration can be impacted by several factors, e.g. pulling down a big, new image, but
786+
can be some indicators refer to.
787+
718788
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
719789

720790
No, but it will be tested manually before merging the PR.
721791

722792
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
793+
723794
No
724795

725796
### Monitoring Requirements
726797

727798
###### How can an operator determine if the feature is in use by workloads?
799+
728800
If their StatefulSet rollingUpdate section has the field maxUnavailable specified with
729801
a value different than 1.
730802
The below command should show maxUnavailable value:
731803
```
732804
kubectl get statefulsets -o yaml | grep maxUnavailable
733805
```
734806

735-
Or refer to the new metric `rolling-update-duration`, it should exist.
807+
Or refer to the new metric `rolling-update-duration-seconds`, it should exist.
736808

737809
###### How can someone using this feature know that it is working for their instance?
738810

739811
With feature enabled, set the `maxUnavailable` great than 1, and pay attention to the rolling update pods at a time,
740812
it should equal to the `maxUnavailable`.
741-
Or when setting the `maxUnavailable` great than 1, the `rolling-update-duration` should decrease.
813+
Or refer to the `rolling-update-duration-seconds`, it can give some indicators generally. Like when setting the `maxUnavailable`
814+
greater than 1, then the duration should descend.
742815

743816
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
744817

@@ -757,7 +830,9 @@ These goals will help you determine what you need to measure (SLIs) in the next
757830
question.
758831
-->
759832

760-
I think it has little relevance with SLOs, but rolling update at a very low speed which impacts the running services.
833+
Rolling update duration is impacted by a lot of factors, like the container hooks, image size, network, container logics,
834+
so what we can roughly know here is for each individual statefulset. So generally, if we set the `maxUnavailable` to X, then
835+
the rolling update duration should reduce X times. Keep in mind that, this is not precise.
761836

762837
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
763838

@@ -766,11 +841,8 @@ Pick one more of these and delete the rest.
766841
-->
767842

768843
- [x] Metrics
769-
- Component exposing the metric: kube-controller-manager
770-
- Metric name: `rolling-update-duration{plugin="PodTopologySpread"}`
771-
- Metric name: `schedule_attempts_total{result="error|unschedulable"}`
772-
773-
None.
844+
- Metric name: rolling-update-duration-seconds
845+
- Components exposing the metric: kube-controller-manager
774846

775847
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
776848

@@ -797,11 +869,14 @@ No
797869
A struct gets added to every StatefulSet object which has three fields, one 32 bit integer and two fields of type string.
798870
The struct in question is IntOrString.
799871

872+
If we introduce the new metric with Opt-2, then we'll have a new field `rollingUpdateStartTime` with `*Time` type
873+
in the `StatefulSetStatus`.
874+
800875
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
801876
No
802877

803878
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
804-
The controller-manager will see very negligible and almost un-notoceable increase in cpu usage.
879+
The controller-manager will see very negligible and almost un-noticeable increase in cpu usage.
805880

806881
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
807882

@@ -828,14 +903,13 @@ For each of them, fill in the following information by copying the below templat
828903
- Testing: Are there any tests for failure mode? If not, describe why.
829904
-->
830905

831-
In a multi-master setup, when the cluster has skewed kube-controller-manager, the behaviors may differ.
906+
Disable the feature-gate when performing rolling-update.
832907

833908
- [Failure mode brief description]
834-
- Detection: the `rolling-update-duration` didn't decrease when setting the `maxUnavailable` great than 1 or increased abnormally.
835-
- Mitigations: Disable the feature.
909+
- Detection: The `maxUnavailabel` is not respected.
910+
- Mitigations: If rolling update in parallel is tolerated for application, set the `podManagementPolicy=Parallel`.
836911
- Diagnostics: Set the logger level great than 4.
837-
- Testing: No testing, because the rolling update duration is hard to measure, it can be impact by a lot of things,
838-
like the master performance.
912+
- Testing: We have e2e tests to cover the right behaviors.
839913

840914
###### What steps should be taken if SLOs are not being met to determine the problem?
841915

keps/sig-apps/961-maxunavailable-for-statefulset/kep.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ feature-gates:
4040
components:
4141
- kube-controller-manager
4242
- kube-apiserver
43+
disable-supported: true
4344

4445
# The following PRR answers are required at beta release
4546
metrics:
46-
- rolling-update-duration
47+
- rolling-update-duration-seconds

0 commit comments

Comments
 (0)