KEP-3751: add error handling strict mode #5485

huww98 · 2025-08-17T14:55:45Z

One-line PR description: add error handling strict mode

Issue link: Kubernetes VolumeAttributesClass ModifyVolume #3751

Other comments: Written on top of KEP-3751: add error handling #5482 . I made a separate PR to keep discussions more focused. Please review the last commit.

Also removes some incorrect copies of content. And fixed an link.

k8s-ci-robot · 2025-08-17T14:55:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huww98
Once this PR has been reviewed and has the lgtm label, please assign saad-ali for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

huww98 · 2025-08-17T15:03:51Z

keps/sig-storage/3751-volume-attributes-class/README.md

+* When there is an error, we will only retry with parameters of A or B, not allowing any other target.
+  User can only change the target after either the modify or the rollback succeeded.
+* if the `currentVolumeAttributesClass` is nil, we don't allow rollback.
+  Instead, we allow change target if the error is infeasible in this case, to allow user to correct some typos easily.


To make sure this one will not break the quota integrity. We will need further restriction in CSI spec:

- The SP MUST NOT have applied any modification to the volume as part of this specific call + The SP MUST NOT have applied any modification to the volume as part of this specific call or any consecutive previous calls with the same parameters.

But this also troubles me because it means whenever SP updates it logic, it will need to consider capability to any previous version, since there may be some volumes stuck at error state for years. I know this is a very edge case, but without that, we cannot make a 100% promise even in strict mode.

huww98 · 2025-08-17T15:08:06Z

keps/sig-storage/3751-volume-attributes-class/README.md

+    switch spec {
+        case target, "": return true
+        case cur: return false
+        default: return status != "Infeasible"


This logic may requires more discussion. When current is A, target is B, spec is C, where should we go? Going to C is not allowed because it may break quota.
My current proposal is going to A if ModifyVolume(B) returns infeasible, otherwise continue to retry B, so that it will be more likely to automatically converge to the user specified state.
But maybe we should always retry B to make the system more predictable.

huww98 · 2025-08-17T15:11:13Z

keps/sig-storage/3751-volume-attributes-class/README.md

+One can verify this contains all states by arbitrarily changing the spec and verify it will still hit a listed state.
+
+For [strict mode](#strict-mode), the implementation is different:
+* We only leave the state `cur == A && target == B` when either modify to A or modify to B success.


The major difference to #5462 . I propose NOT to set target to A before rolling back, to keep tracking B if rollback still fails and maintain quota integrity.

huww98 · 2025-08-17T15:13:41Z

/cc @gnufied

huww98 added 3 commits August 16, 2025 21:45

KEP-3751: Fix the heading level for easier read

5ae3e6e

Also removes some incorrect copies of content. And fixed an link.

KEP-3751: add error handling

7ab48a9

KEP-3751: add error handling strict mode

53475e2

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 17, 2025

k8s-ci-robot requested a review from saad-ali August 17, 2025 14:55

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 17, 2025

k8s-ci-robot requested a review from xing-yang August 17, 2025 14:55

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 17, 2025

huww98 commented Aug 17, 2025

View reviewed changes

k8s-ci-robot requested a review from gnufied August 17, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KEP-3751: add error handling strict mode #5485

KEP-3751: add error handling strict mode #5485

huww98 commented Aug 17, 2025

Uh oh!

k8s-ci-robot commented Aug 17, 2025

Uh oh!

huww98 Aug 17, 2025

Uh oh!

huww98 Aug 17, 2025

Uh oh!

huww98 Aug 17, 2025

Uh oh!

huww98 commented Aug 17, 2025

Uh oh!

Uh oh!

KEP-3751: add error handling strict mode #5485

Are you sure you want to change the base?

KEP-3751: add error handling strict mode #5485

Conversation

huww98 commented Aug 17, 2025

Uh oh!

k8s-ci-robot commented Aug 17, 2025

Uh oh!

huww98 Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

huww98 Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

huww98 Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

huww98 commented Aug 17, 2025

Uh oh!

Uh oh!