-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-961: Bump Statefulset maxUnavailable to beta #3997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-961: Bump Statefulset maxUnavailable to beta #3997
Conversation
a28043c to
e767866
Compare
|
/assign |
|
Sorry for the delay, updates are on the way. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kerthcet The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @soltysh for your detailed reviews, most of the part I just copy-paste from the original documents to make them up-to-date, I will pay more attention to this next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still several things to address here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address @soltysh 's comments
|
|
||
| Yes, it can impact already running workloads. | ||
| This could happen when we downgrade the kube-controller-manager from maxUnavailable-enabled release to a disabled one, | ||
| but it will not impact running workloads, because `maxUnavailable` only works in rolling update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Also the other question that I have is what happens if there is an in-progress rollout (with maxUnavailable>1) and we disable the feature....
Signed-off-by: kerthcet <[email protected]>
Signed-off-by: kerthcet <[email protected]>
Signed-off-by: kerthcet <[email protected]>
Signed-off-by: kerthcet <[email protected]>
Signed-off-by: kerthcet <[email protected]>
e9eae01 to
c9a8b37
Compare
|
PTAL @soltysh @wojtek-t @logicalhan |
| will rollout across nodes. | ||
| --> | ||
|
|
||
| It depends, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this really answers the question.
Please correct me if I'm wrong, but in general:
- this doesn't affect running workload on rollout [the only expection is if the feature was already enabled in the past and there are StatefulSets with the new field set and now they are going through a rolling upgrade when we enable the feature-gate-again]
On rollback may understanding is unclear - what happens if my StatsefulSet is going through a rolling upgrade now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean what will happen if we disable the feature during a statefulset's rolling-update, then it will wait for all the pods ready and rolling-udpate a pod one time.
| that might indicate a serious problem? | ||
| --> | ||
|
|
||
| - As a normal user, with `maxUnavailable` enabled, and we're rolling update a statefulset, the number of pods brought down should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actionable at all if I'm managing e.g. 1000 clusters - I can't monitor number of pods brought down for individual statefulsets.
| - As a administrator(as well for normal user), I can check the new metric `rolling-update-duration-seconds`, if we enabled the feature with | ||
| greater than 1 `maxUnavailable` set for statefulsets, but didn't see the duration fall down, then we may have some problems here. | ||
| This is not precise because rolling-update duration can be impacted by several factors, e.g. pulling down a big, new image, but | ||
| can be some indicators refer to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can we come up with some better metric for it?
We're reporting ReadyReplicas/AvailableReplicas in Status - can we expose a metric if a difference between these and spec.Replicas is more than maxUnavailable as an example?
@soltysh - for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the proposed metric a lot more than the rolling time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what messages can we get from ReadyReplicas != AvailableReplicas, in my understanding, this only relates to another filed minReadySeconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ReadyReplicas != AvailableReplicas.
What I was saying is spec.Replicas - status.readyReplicas > spec.maxUnavailable
We can play with the idea what exactly to expose (maybe a counter, maybe a histogram of replicas - readyReplica - maxUnavailable. Maybe instead of readyReplicas use avalableReplicas.
But something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec.Replicas - status.readyReplicas should always be less than or equal to spec.maxUnavailable, or this is a bug.
The metric makes little sense I think ...
So for the rolling time, it values not only for the maxUnavailable, it can help to measure how long the rolling update takes, providing data for decisions whether to rolling-update in parallel if possible. It can be independent of the KEP actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I compare my 3-replica hello-world with 1000-replica gigantitc java app with 10m startup
Yes this also makes me doubtable about the metric, ok, then I'll drop this idea and it looks more simple now. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t these information are for the most part exposed in kube-state-metrics today: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/statefulset-metrics.md and we could add maxUnavailable with this effort. Would that be acceptable from a PRR standpoint or is it limited to native metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Technically not everyone is running kube-state-metrics, so I would really prefer native one.
But I added it to next PRR meeting as an agenda item so that we have a common answer across approvers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for info, kube-state-metrics doesn't follow Kubernetes metrics stability framework just yet, so there are less guarantees towards its metrics than the ones we have in k/k.
But, although that's the case today, it would still be interesting to know if having API related metrics defined in kube-state-metrics would be acceptable from a KEP standpoint since in most cases they are unfit to the Kubernetes codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to follow-up on it.
We didn't come up with a very concrete conclusion, but the hand-wavy one was:
- for the very simple metrics that effectively monitor a trivial logic on k8s side, we're fine with ksm metrics [like number of objects with a given field etc.]
- for things where the logic of the feature is non-trivial, we actually want a native metric
Now, with this feature we get somewhere on the fence. Let's say we add "maxUnavailable" as a metric to ksm.
IIUC, we can then check the value that I suggested above (replicas-read > maxUnavailable) for a given statefulset purely based on ksm.
Can we easily check how many statefulsets are in a bad state in a given time?
Now thinking further about it, maybe the metric that I would actually want to see is something like
stateful_set_in_bad_state [we obviously need a better name] with a label reason.
Now, for this feature we can set reason=exceededMaxUnavailable and increase the value of that metric at any time when we process a statefulset and we observe that replicase-ready >maxUnavailable".
That actually seems (a) simple enough to implement (b) generic enough for future extension (c) useful enough as it shows if something goes wrong.
Thoughts?
| kubectl get statefulsets -o yaml | grep maxUnavailable | ||
| ``` | ||
|
|
||
| Or refer to the new metric `rolling-update-duration-seconds`, it should exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true - the metric will exist even if none of the workloads is using the feature.
| - Testing: Are there any tests for failure mode? If not, describe why. | ||
| --> | ||
|
|
||
| Disable the feature-gate when performing rolling-update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understood the message of this answer - can you clarify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still quite a few comments to address
| --> | ||
|
|
||
| With `maxUnavailable` feature enabled, we'll bring down more than one pod at a time, if your application can't tolerate | ||
| this behavior, you should absolutely disable this feature or leave the field unconfigured, which will behave as the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| this behavior, you should absolutely disable this feature or leave the field unconfigured, which will behave as the default | |
| this behavior, you should absolutely leave the field unset or use 1, which will match default behavior |
| tried this feature in Alpha, we would have time to fix issues. | ||
| #### Metrics | ||
|
|
||
| We'll add a new metric named `rolling-update-duration-seconds`, it tracks how long a statefulset takes to finish a rolling-update, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: all the metrics have underscores, so rolling_update_duration_seconds will be more appropriate.
|
|
||
| // <Newly Added> | ||
| // rollingUpdateStartTimes records when each statefulset starts to perform rolling update. | ||
| // key is the statefulset name, value is the start time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name is not unique across namespaces, I'd suggest types.UID, since that's more unique.
| The logic general looks the same, we'll update the time together with status update, so there's no | ||
| extra api calls. | ||
|
|
||
| Prefer Opt-2 as we already record the rolling update messages in the status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with that approach, I don't see a reason we want to internal information in the API. First option is much more preferable.
| --> | ||
|
|
||
| - `test/e2e/apps: we'll add a test to verify that the number of pods brought down each | ||
| time is equal to configured maxUnavailable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... that kind of test actually might be easier as integration, because like you wrote above "pod statuses are faked".
|
|
||
| Rolling update duration is impacted by a lot of factors, like the container hooks, image size, network, container logics, | ||
| so what we can roughly know here is for each individual statefulset. So generally, if we set the `maxUnavailable` to X, then | ||
| the rolling update duration should reduce X times. Keep in mind that, this is not precise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't put it as X times, a lot will depend how you implement the metric, and I don't expect you'll expose this as per statefulset, since that wouldn't be reasonable, but rather some kind of histogram, which will represent all statefulset rollouts, so in that case a lot will depend how many statefulset are using this feature. I'd probably write that when using this feature you should at least achieve similar rollout times, and with increased adoption of this feature the overall drop of the metric will be more significant.
| --> | ||
|
|
||
| - [x] Metrics | ||
| - Metric name: rolling-update-duration-seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rolling_update_duration_seconds
| The struct in question is IntOrString. | ||
|
|
||
| If we introduce the new metric with Opt-2, then we'll have a new field `rollingUpdateStartTime` with `*Time` type | ||
| in the `StatefulSetStatus`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, see my other comment.
| Disable the feature-gate when performing rolling-update. | ||
|
|
||
| - [Failure mode brief description] | ||
| - Detection: The `maxUnavailabel` is not respected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Detection: The `maxUnavailabel` is not respected. | |
| - Detection: The `maxUnavailable` is not respected. |
| - [Failure mode brief description] | ||
| - Detection: The `maxUnavailabel` is not respected. | ||
| - Mitigations: If rolling update in parallel is tolerated for application, set the `podManagementPolicy=Parallel`. | ||
| - Diagnostics: Set the logger level great than 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Diagnostics: Set the logger level great than 4. | |
| - Diagnostics: Set the logger level greater than 4. |
|
@kerthcet - will you have time to update the KEP - it's now a good time to start thinking about 1.29 and bigger chances for review capacity :) |
|
This looks like it has a number of unresolved comments from June, it doesn't seem likely that it's being targeted for 1.29? |
Yes, I'll make some efforts next release. Thanks. |
|
@kerthcet are you able to work on pushing this to beta in 1.30 timeframe? |
|
Replaced with #4474 |
|
@soltysh: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…) (#5228) * Adapt to the latest template Signed-off-by: kerthcet <[email protected]> * Bump to beta Signed-off-by: kerthcet <[email protected]> * address comments Signed-off-by: kerthcet <[email protected]> * Address commetns Signed-off-by: kerthcet <[email protected]> * Add new metric rollingUpdateDurationSeconds to statefulset Signed-off-by: kerthcet <[email protected]> * go over review from #3997, change metric * review notes, still need approval on metrics * running and available instead of running and ready * review notes and UNRESOLVED blocks Signed-off-by: Lucas Severo Alves <[email protected]> * added metric information * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * addressed comments * added metric for detection * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * addressing comments * updated test coverage numbers * reworded sentences * added explanation on mid-rollout flag disable --------- Signed-off-by: kerthcet <[email protected]> Signed-off-by: Lucas Severo Alves <[email protected]> Co-authored-by: kerthcet <[email protected]> Co-authored-by: Lucas Severo Alves <[email protected]> Co-authored-by: Janet Kuo <[email protected]> Co-authored-by: Maciej Szulik <[email protected]>
…ernetes#4474) (kubernetes#5228) * Adapt to the latest template Signed-off-by: kerthcet <[email protected]> * Bump to beta Signed-off-by: kerthcet <[email protected]> * address comments Signed-off-by: kerthcet <[email protected]> * Address commetns Signed-off-by: kerthcet <[email protected]> * Add new metric rollingUpdateDurationSeconds to statefulset Signed-off-by: kerthcet <[email protected]> * go over review from kubernetes#3997, change metric * review notes, still need approval on metrics * running and available instead of running and ready * review notes and UNRESOLVED blocks Signed-off-by: Lucas Severo Alves <[email protected]> * added metric information * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Janet Kuo <[email protected]> * addressed comments * added metric for detection * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * Update keps/sig-apps/961-maxunavailable-for-statefulset/README.md Co-authored-by: Maciej Szulik <[email protected]> * addressing comments * updated test coverage numbers * reworded sentences * added explanation on mid-rollout flag disable --------- Signed-off-by: kerthcet <[email protected]> Signed-off-by: Lucas Severo Alves <[email protected]> Co-authored-by: kerthcet <[email protected]> Co-authored-by: Lucas Severo Alves <[email protected]> Co-authored-by: Janet Kuo <[email protected]> Co-authored-by: Maciej Szulik <[email protected]>
No. 1 commit: adapt to the latest KEP template
No. 2 commit: what extra needed when bumping to beta.