-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[KEP-4680] Update README to include configurable HealhCheckTimeout #5476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… message - Added health_check_timeout_seconds field to DeviceHealth message - Updated documentation to reflect that timeout is now configurable per device - Changed Beta graduation criteria from 'implement' to 'verify' since feature is now included in initial design - Addresses PR feedback about DRA API for timeout configuration Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
a751de7
to
8279c43
Compare
// Health check timeout duration in seconds for this device. | ||
// If not specified or zero, Kubelet will use a default timeout. | ||
// Optional. | ||
int64 health_check_timeout_seconds = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the negative value or 0
mean? Let's specify in the field description.
// Health check timeout duration in seconds for this device. | ||
// If not specified or zero, Kubelet will use a default timeout. | ||
// Optional. | ||
int64 health_check_timeout_seconds = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq for PR review I think - is Duration field OK to use in our APIs? Or this is not recommended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
we need to specify the behvior when negative value was set. Can do it during implementation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ArangoGutierrez, SergeyKanzhelev The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @mrunalp |
Implements device-specific health check timeouts in the DRA health monitoring system as defined in KEP-4680. This allows DRA drivers to specify custom timeout values for individual devices through the gRPC health API. Changes: - Add HealthCheckTimeout field to state.DeviceHealth struct to store device-specific timeout durations - Add health_check_timeout_seconds field to DeviceHealth proto message in the DRA health gRPC API (v1alpha1) - Update manager.go to extract timeout from gRPC responses and apply DefaultHealthTimeout (30s) when not specified - Handle negative timeout values defensively by logging a warning and falling back to the default timeout - Simplify healthinfo.go by removing redundant fallback logic since timeouts are now always set at creation time - Update tests to include HealthCheckTimeout in test fixtures The timeout behavior is: - Positive values: Use the specified timeout in seconds - Zero or unspecified: Use DefaultHealthTimeout (30 seconds) - Negative values: Log warning and use DefaultHealthTimeout This implementation provides flexibility for DRA drivers to define appropriate health check intervals for different device types while maintaining backward compatibility through sensible defaults. Ref: KEP-4680 (Add Resource Health to Pod Status) Ref: kubernetes/enhancements#5476 Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Implements device-specific health check timeouts in the DRA health monitoring system as defined in KEP-4680. This allows DRA drivers to specify custom timeout values for individual devices through the gRPC health API. Changes: - Add HealthCheckTimeout field to state.DeviceHealth struct to store device-specific timeout durations - Add health_check_timeout_seconds field to DeviceHealth proto message in the DRA health gRPC API (v1alpha1) - Update manager.go to extract timeout from gRPC responses and apply DefaultHealthTimeout (30s) when not specified - Handle negative timeout values defensively by logging a warning and falling back to the default timeout - Simplify healthinfo.go by removing redundant fallback logic since timeouts are now always set at creation time - Update tests to include HealthCheckTimeout in test fixtures The timeout behavior is: - Positive values: Use the specified timeout in seconds - Zero or unspecified: Use DefaultHealthTimeout (30 seconds) - Negative values: Log warning and use DefaultHealthTimeout This implementation provides flexibility for DRA drivers to define appropriate health check intervals for different device types while maintaining backward compatibility through sensible defaults. Ref: KEP-4680 (Add Resource Health to Pod Status) Ref: kubernetes/enhancements#5476 Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Implements device-specific health check timeouts in the DRA health monitoring system as defined in KEP-4680. This allows DRA drivers to specify custom timeout values for individual devices through the gRPC health API. Changes: - Add HealthCheckTimeout field to state.DeviceHealth struct to store device-specific timeout durations - Add health_check_timeout_seconds field to DeviceHealth proto message in the DRA health gRPC API (v1alpha1) - Update manager.go to extract timeout from gRPC responses and apply DefaultHealthTimeout (30s) when not specified - Handle negative timeout values defensively by logging a warning and falling back to the default timeout - Simplify healthinfo.go by removing redundant fallback logic since timeouts are now always set at creation time - Update tests to include HealthCheckTimeout in test fixtures The timeout behavior is: - Positive values: Use the specified timeout in seconds - Zero or unspecified: Use DefaultHealthTimeout (30 seconds) - Negative values: Log warning and use DefaultHealthTimeout This implementation provides flexibility for DRA drivers to define appropriate health check intervals for different device types while maintaining backward compatibility through sensible defaults. Ref: KEP-4680 (Add Resource Health to Pod Status) Ref: kubernetes/enhancements#5476 Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Implements device-specific health check timeouts in the DRA health monitoring system as defined in KEP-4680. This allows DRA drivers to specify custom timeout values for individual devices through the gRPC health API. Changes: - Add HealthCheckTimeout field to state.DeviceHealth struct to store device-specific timeout durations - Add health_check_timeout_seconds field to DeviceHealth proto message in the DRA health gRPC API (v1alpha1) - Update manager.go to extract timeout from gRPC responses and apply DefaultHealthTimeout (30s) when not specified - Handle negative timeout values defensively by logging a warning and falling back to the default timeout - Simplify healthinfo.go by removing redundant fallback logic since timeouts are now always set at creation time - Update tests to include HealthCheckTimeout in test fixtures The timeout behavior is: - Positive values: Use the specified timeout in seconds - Zero or unspecified: Use DefaultHealthTimeout (30 seconds) - Negative values: Log warning and use DefaultHealthTimeout This implementation provides flexibility for DRA drivers to define appropriate health check intervals for different device types while maintaining backward compatibility through sensible defaults. Ref: KEP-4680 (Add Resource Health to Pod Status) Ref: kubernetes/enhancements#5476 Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Update KEP 4680 README to include configurable HealthCheck into the DRA API