-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
area/machinehealthcheckIssues or PRs related to machinehealthchecksIssues or PRs related to machinehealthcheckskind/api-changeCategorizes issue or PR as related to adding, removing, or otherwise changing an APICategorizes issue or PR as related to adding, removing, or otherwise changing an APIpriority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Description
MachineHealthCheck currently exposes these fields:
// Any further remediation is only allowed if at most "MaxUnhealthy" machines selected by
// "selector" are not healthy.
// +optional
MaxUnhealthy *intstr.IntOrString `json:"maxUnhealthy,omitempty"`
// Any further remediation is only allowed if the number of machines selected by "selector" as not healthy
// is within the range of "UnhealthyRange". Takes precedence over MaxUnhealthy.
// Eg. "[3-5]" - This means that remediation will be allowed only when:
// (a) there are at least 3 unhealthy machines (and)
// (b) there are at most 5 unhealthy machines
// +optional
// +kubebuilder:validation:Pattern=^\[[0-9]+-[0-9]+\]$
UnhealthyRange *string `json:"unhealthyRange,omitempty"`
At a first glance, the fields seems to control remediation, and the comment seems to suggest as such; although in reality they only control setting the conditions not when health checks fail and remediation should occur. This can be a confusing behavior to most users and counter intuitive at best.
For example:
Let's say I have 10 machines in my cluster, and 5 go unhealthy for some reason. The knobs above, if set let's say to only allow 20% or 2, make the MachineHealthCheck to stop setting the condition after 2 machines have been marked and continue if and only if the rest of the Machines. In reality, 5 Machines are unhealthy, but only 2 are marked as such.
chrischdi and fabriziopandini
Metadata
Metadata
Assignees
Labels
area/machinehealthcheckIssues or PRs related to machinehealthchecksIssues or PRs related to machinehealthcheckskind/api-changeCategorizes issue or PR as related to adding, removing, or otherwise changing an APICategorizes issue or PR as related to adding, removing, or otherwise changing an APIpriority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.