Skip to content

Conversation

@kerthcet
Copy link
Member

@kerthcet kerthcet commented May 11, 2023

  • One-line PR description: Bump statefulset maxUnavailable to beta
  • Other comments:

No. 1 commit: adapt to the latest KEP template
No. 2 commit: what extra needed when bumping to beta.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 11, 2023
@k8s-ci-robot k8s-ci-robot requested review from kow3ns and soltysh May 11, 2023 07:29
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 11, 2023
@wojtek-t wojtek-t self-assigned this May 11, 2023
@kerthcet kerthcet force-pushed the feat/upgrade-maxUnavailable-to-beta branch from a28043c to e767866 Compare May 11, 2023 07:32
@kerthcet kerthcet changed the title [WIP] KEP-961: Bump Statefulset maxUnavailable to beta KEP-961: Bump Statefulset maxUnavailable to beta May 16, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023
@soltysh
Copy link
Contributor

soltysh commented May 18, 2023

/assign

@kerthcet
Copy link
Member Author

Sorry for the delay, updates are on the way.

@soltysh soltysh mentioned this pull request May 31, 2023
8 tasks
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kerthcet
Once this PR has been reviewed and has the lgtm label, please ask for approval from soltysh, wojtek-t. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member Author

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soltysh for your detailed reviews, most of the part I just copy-paste from the original documents to make them up-to-date, I will pay more attention to this next time.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still several things to address here.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @soltysh 's comments


Yes, it can impact already running workloads.
This could happen when we downgrade the kube-controller-manager from maxUnavailable-enabled release to a disabled one,
but it will not impact running workloads, because `maxUnavailable` only works in rolling update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Also the other question that I have is what happens if there is an in-progress rollout (with maxUnavailable>1) and we disable the feature....

@kerthcet kerthcet force-pushed the feat/upgrade-maxUnavailable-to-beta branch from e9eae01 to c9a8b37 Compare June 15, 2023 12:41
@kerthcet
Copy link
Member Author

PTAL @soltysh @wojtek-t @logicalhan
Besides addressing your comments, added the logics about the new metric rolling-update-duration-seconds.

will rollout across nodes.
-->

It depends, like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this really answers the question.

Please correct me if I'm wrong, but in general:

  • this doesn't affect running workload on rollout [the only expection is if the feature was already enabled in the past and there are StatefulSets with the new field set and now they are going through a rolling upgrade when we enable the feature-gate-again]

On rollback may understanding is unclear - what happens if my StatsefulSet is going through a rolling upgrade now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean what will happen if we disable the feature during a statefulset's rolling-update, then it will wait for all the pods ready and rolling-udpate a pod one time.

that might indicate a serious problem?
-->

- As a normal user, with `maxUnavailable` enabled, and we're rolling update a statefulset, the number of pods brought down should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actionable at all if I'm managing e.g. 1000 clusters - I can't monitor number of pods brought down for individual statefulsets.

- As a administrator(as well for normal user), I can check the new metric `rolling-update-duration-seconds`, if we enabled the feature with
greater than 1 `maxUnavailable` set for statefulsets, but didn't see the duration fall down, then we may have some problems here.
This is not precise because rolling-update duration can be impacted by several factors, e.g. pulling down a big, new image, but
can be some indicators refer to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we come up with some better metric for it?

We're reporting ReadyReplicas/AvailableReplicas in Status - can we expose a metric if a difference between these and spec.Replicas is more than maxUnavailable as an example?

@soltysh - for thoughts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the proposed metric a lot more than the rolling time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what messages can we get from ReadyReplicas != AvailableReplicas, in my understanding, this only relates to another filed minReadySeconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ReadyReplicas != AvailableReplicas.

What I was saying is spec.Replicas - status.readyReplicas > spec.maxUnavailable

We can play with the idea what exactly to expose (maybe a counter, maybe a histogram of replicas - readyReplica - maxUnavailable. Maybe instead of readyReplicas use avalableReplicas.
But something along those lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec.Replicas - status.readyReplicas should always be less than or equal to spec.maxUnavailable, or this is a bug.
The metric makes little sense I think ...

So for the rolling time, it values not only for the maxUnavailable, it can help to measure how long the rolling update takes, providing data for decisions whether to rolling-update in parallel if possible. It can be independent of the KEP actually.

Copy link
Member Author

@kerthcet kerthcet Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I compare my 3-replica hello-world with 1000-replica gigantitc java app with 10m startup

Yes this also makes me doubtable about the metric, ok, then I'll drop this idea and it looks more simple now. Thanks.

Copy link
Member

@dgrisonnet dgrisonnet Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t these information are for the most part exposed in kube-state-metrics today: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/statefulset-metrics.md and we could add maxUnavailable with this effort. Would that be acceptable from a PRR standpoint or is it limited to native metrics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Technically not everyone is running kube-state-metrics, so I would really prefer native one.
But I added it to next PRR meeting as an agenda item so that we have a common answer across approvers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for info, kube-state-metrics doesn't follow Kubernetes metrics stability framework just yet, so there are less guarantees towards its metrics than the ones we have in k/k.

But, although that's the case today, it would still be interesting to know if having API related metrics defined in kube-state-metrics would be acceptable from a KEP standpoint since in most cases they are unfit to the Kubernetes codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to follow-up on it.
We didn't come up with a very concrete conclusion, but the hand-wavy one was:

  • for the very simple metrics that effectively monitor a trivial logic on k8s side, we're fine with ksm metrics [like number of objects with a given field etc.]
  • for things where the logic of the feature is non-trivial, we actually want a native metric

Now, with this feature we get somewhere on the fence. Let's say we add "maxUnavailable" as a metric to ksm.
IIUC, we can then check the value that I suggested above (replicas-read > maxUnavailable) for a given statefulset purely based on ksm.
Can we easily check how many statefulsets are in a bad state in a given time?

Now thinking further about it, maybe the metric that I would actually want to see is something like
stateful_set_in_bad_state [we obviously need a better name] with a label reason.

Now, for this feature we can set reason=exceededMaxUnavailable and increase the value of that metric at any time when we process a statefulset and we observe that replicase-ready >maxUnavailable".

That actually seems (a) simple enough to implement (b) generic enough for future extension (c) useful enough as it shows if something goes wrong.

Thoughts?

kubectl get statefulsets -o yaml | grep maxUnavailable
```

Or refer to the new metric `rolling-update-duration-seconds`, it should exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true - the metric will exist even if none of the workloads is using the feature.

- Testing: Are there any tests for failure mode? If not, describe why.
-->

Disable the feature-gate when performing rolling-update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understood the message of this answer - can you clarify this?

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still quite a few comments to address

-->

With `maxUnavailable` feature enabled, we'll bring down more than one pod at a time, if your application can't tolerate
this behavior, you should absolutely disable this feature or leave the field unconfigured, which will behave as the default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this behavior, you should absolutely disable this feature or leave the field unconfigured, which will behave as the default
this behavior, you should absolutely leave the field unset or use 1, which will match default behavior

tried this feature in Alpha, we would have time to fix issues.
#### Metrics

We'll add a new metric named `rolling-update-duration-seconds`, it tracks how long a statefulset takes to finish a rolling-update,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: all the metrics have underscores, so rolling_update_duration_seconds will be more appropriate.


// <Newly Added>
// rollingUpdateStartTimes records when each statefulset starts to perform rolling update.
// key is the statefulset name, value is the start time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is not unique across namespaces, I'd suggest types.UID, since that's more unique.

The logic general looks the same, we'll update the time together with status update, so there's no
extra api calls.

Prefer Opt-2 as we already record the rolling update messages in the status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with that approach, I don't see a reason we want to internal information in the API. First option is much more preferable.

-->

- `test/e2e/apps: we'll add a test to verify that the number of pods brought down each
time is equal to configured maxUnavailable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... that kind of test actually might be easier as integration, because like you wrote above "pod statuses are faked".


Rolling update duration is impacted by a lot of factors, like the container hooks, image size, network, container logics,
so what we can roughly know here is for each individual statefulset. So generally, if we set the `maxUnavailable` to X, then
the rolling update duration should reduce X times. Keep in mind that, this is not precise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't put it as X times, a lot will depend how you implement the metric, and I don't expect you'll expose this as per statefulset, since that wouldn't be reasonable, but rather some kind of histogram, which will represent all statefulset rollouts, so in that case a lot will depend how many statefulset are using this feature. I'd probably write that when using this feature you should at least achieve similar rollout times, and with increased adoption of this feature the overall drop of the metric will be more significant.

-->

- [x] Metrics
- Metric name: rolling-update-duration-seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rolling_update_duration_seconds

The struct in question is IntOrString.

If we introduce the new metric with Opt-2, then we'll have a new field `rollingUpdateStartTime` with `*Time` type
in the `StatefulSetStatus`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, see my other comment.

Disable the feature-gate when performing rolling-update.

- [Failure mode brief description]
- Detection: The `maxUnavailabel` is not respected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Detection: The `maxUnavailabel` is not respected.
- Detection: The `maxUnavailable` is not respected.

- [Failure mode brief description]
- Detection: The `maxUnavailabel` is not respected.
- Mitigations: If rolling update in parallel is tolerated for application, set the `podManagementPolicy=Parallel`.
- Diagnostics: Set the logger level great than 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Diagnostics: Set the logger level great than 4.
- Diagnostics: Set the logger level greater than 4.

@wojtek-t
Copy link
Member

@kerthcet - will you have time to update the KEP - it's now a good time to start thinking about 1.29 and bigger chances for review capacity :)

@jeremyrickard
Copy link
Contributor

This looks like it has a number of unresolved comments from June, it doesn't seem likely that it's being targeted for 1.29?

@kerthcet
Copy link
Member Author

kerthcet commented Nov 7, 2023

This looks like it has a number of unresolved comments from June, it doesn't seem likely that it's being targeted for 1.29?

Yes, I'll make some efforts next release. Thanks.

@atiratree
Copy link
Member

@kerthcet are you able to work on pushing this to beta in 1.30 timeframe?

@soltysh
Copy link
Contributor

soltysh commented Feb 5, 2024

Replaced with #4474
/close

@k8s-ci-robot
Copy link
Contributor

@soltysh: Closed this PR.

In response to this:

Replaced with #4474
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Edwinhr716 pushed a commit to Edwinhr716/enhancements that referenced this pull request Apr 3, 2025
k8s-ci-robot pushed a commit that referenced this pull request Jun 17, 2025
…) (#5228)

* Adapt to the latest template

Signed-off-by: kerthcet <[email protected]>

* Bump to beta

Signed-off-by: kerthcet <[email protected]>

* address comments

Signed-off-by: kerthcet <[email protected]>

* Address commetns

Signed-off-by: kerthcet <[email protected]>

* Add new metric rollingUpdateDurationSeconds to statefulset

Signed-off-by: kerthcet <[email protected]>

* go over review from #3997, change metric

* review notes, still need approval on metrics

* running and available instead of running and ready

* review notes and UNRESOLVED blocks

Signed-off-by: Lucas Severo Alves <[email protected]>

* added metric information

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* addressed comments

* added metric for detection

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* addressing comments

* updated test coverage numbers

* reworded sentences

* added explanation on mid-rollout flag disable

---------

Signed-off-by: kerthcet <[email protected]>
Signed-off-by: Lucas Severo Alves <[email protected]>
Co-authored-by: kerthcet <[email protected]>
Co-authored-by: Lucas Severo Alves <[email protected]>
Co-authored-by: Janet Kuo <[email protected]>
Co-authored-by: Maciej Szulik <[email protected]>
ajaysundark pushed a commit to ajaysundark/enhancements that referenced this pull request Oct 6, 2025
…ernetes#4474) (kubernetes#5228)

* Adapt to the latest template

Signed-off-by: kerthcet <[email protected]>

* Bump to beta

Signed-off-by: kerthcet <[email protected]>

* address comments

Signed-off-by: kerthcet <[email protected]>

* Address commetns

Signed-off-by: kerthcet <[email protected]>

* Add new metric rollingUpdateDurationSeconds to statefulset

Signed-off-by: kerthcet <[email protected]>

* go over review from kubernetes#3997, change metric

* review notes, still need approval on metrics

* running and available instead of running and ready

* review notes and UNRESOLVED blocks

Signed-off-by: Lucas Severo Alves <[email protected]>

* added metric information

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Janet Kuo <[email protected]>

* addressed comments

* added metric for detection

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md

Co-authored-by: Maciej Szulik <[email protected]>

* addressing comments

* updated test coverage numbers

* reworded sentences

* added explanation on mid-rollout flag disable

---------

Signed-off-by: kerthcet <[email protected]>
Signed-off-by: Lucas Severo Alves <[email protected]>
Co-authored-by: kerthcet <[email protected]>
Co-authored-by: Lucas Severo Alves <[email protected]>
Co-authored-by: Janet Kuo <[email protected]>
Co-authored-by: Maciej Szulik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

9 participants