Skip to content

Conversation

nikos445
Copy link

@nikos445 nikos445 commented Aug 25, 2025

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nikos445
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from dchen1107 August 25, 2025 20:39
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 25, 2025
@k8s-ci-robot k8s-ci-robot requested a review from mrunalp August 25, 2025 20:39
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 25, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @nikos445!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 25, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @nikos445. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment on lines +365 to +376
// Get remote image id (digest)
remoteImageRef, _ := GetRemoteImageDigestWithoutPull(ctx, requestedImage, pullSecrets)

// Compare digests if not match then pull image from remote
if remoteImageRef != imageRef {
msg = fmt.Sprintf("Remote image for %q changed (local=%s, remote=%s), will pull new one.", requestedImage, imageRef, remoteImageRef)
m.logIt(objRef, v1.EventTypeWarning, events.FailedToInspectImage, logPrefix, msg, klog.Warning)
return "", msg, err
}

msg = fmt.Sprintf("Remote image for %q is exactly the same (local=%s, remote=%s), using existing one.", requestedImage, imageRef, remoteImageRef)
m.logIt(objRef, v1.EventTypeWarning, events.FailedToInspectImage, logPrefix, msg, klog.Warning)
Copy link
Contributor

@everpeace everpeace Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about how this simple digest equality check can guarantee the image on registry is newer one.

If I understood correctly, there is basically no concept of newer/older over container images, and container images are just identified with a digest hash at the bottom.

Thus, I think there could be complicated scenarios (e.g., rollback operation, registry was restored from backup after some severe failures, etc.) that might not fit with the meaning of Newer.

So, we could consider better name for this policy behaviour? For example, IfNoUpdatedPresent? I know this is not a good one though...

Copy link
Author

@nikos445 nikos445 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion, how about: IfDiffNotPresent ? Since it checks for difference in reality. Reverting from backup hasn't crossed my mind that might be older so 'newer' is not right keyword, you are right.

Copy link
Contributor

@everpeace everpeace Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem is that if this policy can accept older images ( I imagined the case which the registory was restored from backup), in such a case, the policy could not achieve backported/pached image user-story (Story 2).

In this case, even backported/pached image was already mirrored on the node, older image(not backporeted/patched) can be pulled from the registry...

Changing the policy name could fit with this behavior. But, is this behavior really desired and wanted?

Thus, now, I'm thinking that it's not a good idea to support such anti-pattern practice, as discussed in kubernetes/kubernetes#133680.

As discussed in kubernetes/kubernetes#133680 and you are already aware, Always policy can cover most user stories described in the KEP. The exception case is Always policy with "cached locally but non-exist on registry" case as your described in slack( https://kubernetes.slack.com/archives/C0BP8PW9G/p1756226046849689?thread_ts=1756224253.541639&cid=C0BP8PW9G).

So, how about elaborating this KEP to focus on the case??


how about: IfDiffNotPresent ?

how about IfDigestChanged?? We don't need to use the word of Present, right?

Copy link
Author

@nikos445 nikos445 Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would like mostly to focus on that use case where pods may have locally an image and not starting because Always failed to pull.

This is exactly what I try to implement here and if solved in policy Always, then no need to have a separate IfNewerNotPresent policy. Also we could implement something to check digest with Always before pulling, to make it even faster than trying to verify all layers (but up to KEP approval)

I just started this discussion with another imagePullPolicy since I wanted to not modify core Kubernetes functionality.

Copy link
Contributor

@everpeace everpeace Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we could implement something to check digest with Always before pulling to make it even faster than trying to verify all layers (but up to KEP approval)

Sounds nice! I'd say this could separate into two independent KEPs:

  • Make Always faster by just checking its digest
    • This would also need KEP, but this would be ok to update only CRI
    • no user facing changes (just speedup), right?
  • Make Always resilient for the case which "cached locally, but not-exist on remote"
    • This would introduce a new image pull policy because it's a behavioral change

I just started the discussion with another imagePullPolicy since I wanted to not modify core Kubernetes functionality.

could you kindly give me a pointer to this discussion? I'd like to watch :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry its a typo "this discussion" instead of "the discussion" :) I meant this KEP-5497

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot to mention, I like your suggestion "IfDigestChanged"


This KEP proposes the introduction of a new `imagePullPolicy` mode named `IfNewerNotPresent`. The desired outcome is to provide a more efficient and
predictable image-pulling strategy that ensures workloads use the latest available image when updates exist, while avoiding unnecessary downloads and
startup delays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the image repository is unreachable?

  • With Always, the Pod fails to start.
  • With IfNotPresent, the Pod can still start.

Generally, is forcing a pull necessary if the image ID has not changed?

  • Probably not.
  • Would this be considered a bug in Always? (Most likely not.)
  • To me, IfNewerNotPresent feels closer to an AlwaysCheck mode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, IfNewerNotPresent checks image id if not same as local then it pulls (like Always) if same then it doesnt change anything (like IfNotPresent) and if registry fails or some connection is not possible then it falls back to local image after some retries (like IfNotPresent).

Copy link
Member

@mikebrow mikebrow Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, IfNewerNotPresent checks image id if not same as local then it pulls (like Always) if same then it doesnt change anything (like IfNotPresent) and if registry fails or some connection is not possible then it falls back to local image after some retries (like IfNotPresent).

couple comments...

  1. the terminology is even more confusing than always/ifnotpresent/never .. as you are injecting an extra always pull behavior but reducing it's rule to force a check only if the connection is "currently" available.. if not run pod/container anyway..
  2. Always is also a security check. All you would need to do is pull the ethernet cable to avoid the security check.
  3. prefer to not overload the always pull image policy with a simple rule allows fall back to if not present on bad connection/access denied issues.

I recommend considering a different route. Something that adds a new rule type with directives that decide what to do on connection failure, access denied, image tag does not exist, ... type issues between the local container image service and the image registry. But IMO that can probably only apply to the if not present rule... otherwise always no longer means always. Some clouds run with the injector that changes the image pull policy to always for multi-tenant use.

IMO the if not present determination could be subject to further clarification so long as the default determination is backwards compatible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mike,
thanks for the feedback. 😀

So as I understand, you suggest to "invent" a new variable in spec like imagePullPolicy for covering the use cases of fallbacks, e.g imagePullFallbackPolicy.

With modes like:

  • if it fails to pull image from remote, then to use local.
  • if it fails, to stay on imagePullBackoff.
  • etc...

Thats a great idea, but this doesn't cover the cases of mutable images.

My idea, about IfNewerNotPresent, was about pulling if images, if Digest doesn't match. Like when someone pushed a "latest" or "staging" tag.

That was the initial suggestion for this rule (IfNewerNotPresent), the secondary idea is the fallback, to Local image, if remote image is not available or if it already exists (like IfNotPresent does).

I suggested a different rule, so we can maintain backwards compatibility and the rules: "Always" and "IfNotPresent" intact. For people that need it.

My use cases described in KEP are not covered fully by Always (due to not checking checksum and fallback) and not at all by the IfNotPresent (doesnt check anything - always use local)

I think your suggestion might be interesting case for another KEP even, but not relevant to this imagePullPolicy mode enhancement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the naming, I am happy to hear any suggestions people have to improve it 😁 I understand its not very clear but I believe good Documentation will solve this matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where I said "image tag does not exist, ..." tag deleted or no longer present on the registry is also a type of image tag mutation. By .. I mean that reason and any other reason / rule for clarifying what present and/or not present means with respect to image tagging and other image meta data, such as present and recent(when was the last time a tag was checked within 1hr, 1day, 1week,...), present and authorized (already covered by ensure secret pulled images), ... Not an exhaustive list. If a very fat image layer is already being downloaded do you want to run the pod on the older image or wait. Would there be a policy for upgrading the local image and using that updated image when it has been retrieved. I believe this was one of the patterns being discussed in the linked issue.

As to not checking checksum.. do you mean you think something is amiss with the runtime's process for verifying SHA digests of the container image manifests, config, layers, ...?

my reason for suggesting a different path is that the image pull policy already has a few hidden dimensions adding another might make it even more complicated and risky...

Copy link
Member

@mikebrow mikebrow Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming is .. hard

maybe ImageResolvePolicy

@pacoxu
Copy link
Member

pacoxu commented Aug 29, 2025

cc @mikebrow @haircommander

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants