-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-3751: add error handling strict mode #5485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
huww98
commented
Aug 17, 2025
- One-line PR description: add error handling strict mode
- Issue link: Kubernetes VolumeAttributesClass ModifyVolume #3751
- Other comments: Written on top of KEP-3751: add error handling #5482 . I made a separate PR to keep discussions more focused. Please review the last commit.
Also removes some incorrect copies of content. And fixed an link.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* When there is an error, we will only retry with parameters of A or B, not allowing any other target. | ||
User can only change the target after either the modify or the rollback succeeded. | ||
* if the `currentVolumeAttributesClass` is nil, we don't allow rollback. | ||
Instead, we allow change target if the error is infeasible in this case, to allow user to correct some typos easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure this one will not break the quota integrity. We will need further restriction in CSI spec:
- The SP MUST NOT have applied any modification to the volume as part of this specific call
+ The SP MUST NOT have applied any modification to the volume as part of this specific call or any consecutive previous calls with the same parameters.
But this also troubles me because it means whenever SP updates it logic, it will need to consider capability to any previous version, since there may be some volumes stuck at error state for years. I know this is a very edge case, but without that, we cannot make a 100% promise even in strict mode.
switch spec { | ||
case target, "": return true | ||
case cur: return false | ||
default: return status != "Infeasible" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic may requires more discussion. When current is A, target is B, spec is C, where should we go? Going to C is not allowed because it may break quota.
My current proposal is going to A if ModifyVolume(B) returns infeasible, otherwise continue to retry B, so that it will be more likely to automatically converge to the user specified state.
But maybe we should always retry B to make the system more predictable.
One can verify this contains all states by arbitrarily changing the spec and verify it will still hit a listed state. | ||
|
||
For [strict mode](#strict-mode), the implementation is different: | ||
* We only leave the state `cur == A && target == B` when either modify to A or modify to B success. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The major difference to #5462 . I propose NOT to set target to A before rolling back, to keep tracking B if rollback still fails and maintain quota integrity.
/cc @gnufied |