-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-4742: support node topologies in downward API #5146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c7bbaea to
ac50d21
Compare
|
Some notes from discussion this afternoon: We don't really want per-node config because a pod might behave differently on different nodes (and because back-rev nodes are a thing). We don't really want async copying from node->pod because env vars can only be set at pod start. So we need a synchronous mechanism, which is why @munnerz went with Binding. We don't want every scheduler to be responsible for setting topology labels, so we @munnerz added admission control. I have heard now from folks who would like to expose OTHER node labels at the same time, at the discretion of the cluster admin, not the pod. So why don't we just merge ALL of the We can offer a built-in admission e.g. "PodTopologyLabels" which copies the specific ones listed in this KEP. If you are a cluster admin or operator and you want to expose labels to pods, add admission control on It's not a terrible mechanism, IMO. No magic virtual labels in kubelet. No special copying labels to status. Just a plain old merge of plain old data. |
|
I have a one more question: this KEP proposes that ALL OF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this. If we want to make it easy, let's publish a container image that people can use to make the Node topology information as data within an emptyDir (or similar). That approach has zero in-tree code to maintain, and works.
We could also consider kubernetes/kubernetes#116993 as a mechanism to make the output from the init container available as the environment variables of a container (avoids customizing the app container image).
|
|
||
| ## Implementation History | ||
|
|
||
| - the `Summary`, `Motivation`, `Design Details` sections being merged, signaling SIG acceptance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no backticks needed
This isn't implementation history.
c83eba4 to
f62b038
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve for API
/lgtm
Needs sig-node and PRR
|
|
||
| ### Goals | ||
|
|
||
| * Values from Node labels `topology.k8s.io/zone`, `topology.k8s.io/region` and `kubernetes.io/hostname` are made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you dropping the *.topology.k8s.io affordance from this KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, less scope is more gooder.
On the other hand, I thought it was expressly part of the original KEP that this be more extensible? Yes, users can install a mutating admission - is that oo heavyweight for this? How common will extensibility be?
I guess we can wait and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I decided to only include the standard labels for now as it's safer and doing *.topology.k8s.io/* doesn't seem that much more extensible than what we doing now. Happy to revisit this after alpha once we get some user feedback.
| * A feature gate, `PodTopologyLabelsAdmission` will be introduced in v1.33. Alpha and disabled by default. | ||
| The `PodTopologyLabels` admission plugin can only be set when this feature gate is enabled. | ||
| * The Binding REST implementation will be updated to copy all labels from `pods/binding` subresource into Pods. | ||
| At this point we do not remove labels, it is assumed that any labels in Binding are safe to expose via Downward API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any label key in the Binding already exists in the pod, it will be dropped and will NOT overwrite the pod's value.
|
I've suggested recording the alternatives we've discounted. We should do that unless the suggestions - eg #5146 (comment) - are really off target. @andrewsykim would you be willing to revise the KEP to record those discarded alternative approaches? |
50f8860 to
bf8dcb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see if there's a scheduling metric we can watch, in case some bugs in this code cause failures during binding.
|
|
||
| ###### How can a rollout or rollback fail? Can it impact already running workloads? | ||
|
|
||
| Failure on rollout is unlikely since we do not override any existing topolgoy labels on Pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/topolgoy/topology/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
|
||
| ###### How can an operator determine if the feature is in use by workloads? | ||
|
|
||
| Look for topology labels in Pod labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually no user control over the feature so this question is N/A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| ## Implementation History | ||
|
|
||
| - the Summary, Motivation, Design Details sections being merged, signaling SIG acceptance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other KEPs for the relevant type of information to put in this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
152260c to
b774f6d
Compare
|
Given there is no API surface, really, I am not sure my vote counts, but I am LGTM :) |
|
|
||
| * A built-in Kubernetes admission plugin, `PodTopologyLabels` will be introduced in kube-apiserver | ||
| * The `PodTopologyLabels` admission plugin is responsible for mutating `pods/binding` subresource, adding topology labels matching the target Node. | ||
| * `PodTopologyLabels` admission will overwrite `topology.k8s.io/*` labels on Pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of them? Or just those it plans to copy? I think we need to only overwrite what we plan to set - this way customer's mutating webhook is not affected and can copy additional labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goals section details topology.k8s.io/zone, topology.k8s.io/region and kubernetes.io/hostname are the only label values that will be copied in the admission plugin (these will be copied onto the Binding).
The registry is what is responsible for copying labels from the (mutated) binding onto the Pod resource (at the same time as it persists the assigned spec.nodeName).
Based on The Binding REST implementation will be updated to copy all labels from pods/binding subresource into Pods. (below), the registry will be enhanced to always copy across all labels from the Binding onto the Pod (regardless whether topology.k8s.io or example.com), however will only overwrite the 3 known zone, region and hostname labels (as per At this point we will overwrite Pod labels in Binding that are allowed to be exposed via Downward API. below).
I do agree we need to ensure we are water-tight on defining what and where we expose and copy, as service providers will build upon these behaviours quickly.
| * The `PodTopologyLabels` admission plugin is responsible for mutating `pods/binding` subresource, adding topology labels matching the target Node. | ||
| * `PodTopologyLabels` admission will overwrite `topology.k8s.io/*` labels on Pods. | ||
| * A feature gate, `PodTopologyLabelsAdmission` will be introduced in v1.33. Alpha and disabled by default. | ||
| The `PodTopologyLabels` admission plugin can only be set when this feature gate is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The `PodTopologyLabels` admission plugin can only be set when this feature gate is enabled. | |
| The `PodTopologyLabels` admission plugin will be enabled by default when this feature gate is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits but generally LGTM for PRR.
#4742 needs lead-opted-in and v1.33 milestone so release team tracks this
|
|
||
| ## Implementation History | ||
|
|
||
| - - `v1.33`: initial KEP is accepeted and alpha implementation is complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting here
| items: | ||
| - path: "zone" | ||
| fieldRef: | ||
| fieldPath: metadata.labels['topology.kubernetes.io/zone'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion, thanks Sergey
8cfe60a to
d105e2d
Compare
Signed-off-by: Andrew Sy Kim <[email protected]> Co-authored-by: docandrew <[email protected]>
d105e2d to
e09e1f7
Compare
|
/approve based on Tim and Sergey's review. |
|
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve for PRR
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, dchen1107, johnbelamaric, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| ## Proposal | ||
|
|
||
| Retreival of topology information will be achieved through the existing mechanism of passing down Pod labels using Downward API. | ||
| A new admission plugin, `PodTopologyLabels`, will be added to kube-apiserver. When enabled, it will mutate the `pods/binding` subresource, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like there are two separate (yet related) questions:
- Whether to enable
PodTopologyLabelsthe admission by default - Whether to allow certain pods to opt in or out this feature.
One potential issue I see is that if we do this by default for all pods, it might cause problems for users/pods that do not want this feature for whatever reason. And we're unlikely to get feedback from them until this is made the default.
| ##### e2e tests | ||
|
|
||
| E2E tests will be added to test the following scenarios: | ||
| * Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and not contain any additional labels
| E2E tests will be added to test the following scenarios: | ||
| * Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled. | ||
| * A Pod using downward API to retrieve the underlying topology information about the Node | ||
| * A Pod attempting to use downward API to retrieve Node labels that are not the standard topology labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a test case of consuming topology labels via downward API when the some labels are not set
|
I wonder if we'd like to suggest a curated ValidatingAdmissionPolicy that rejects problem Pods. Eg pod is labelled as |
Funny enough, this already is done for annotations here: https://github.com/kubernetes/kubernetes/blob/63ef2dd03de1560ace32e145c531e1148a54f827/pkg/registry/core/pod/storage/storage.go#L242-L247 So technically speaking it should be possible today to implement just the admission plugin, with the one caveat that the labels must be mapped to annotations, creating a confusing user experience. That said, with label selectors often being used to configure things like traffic routing (which in some service meshes also encompassing service identity), it does feel slightly less scary to copy the scheduler and/or admission set values as annotations instead... |
|
@andrewsykim can you update the title of this PR to reflect the correct KEP number? (4742, not 4724!) |
Uh oh!
There was an error while loading. Please reload this page.