Skip to content

Conversation

zac-nixon
Copy link
Collaborator

Description

This is a long one because it involves,
1/ Setting up a new event handler for Reference Grant events
2/ Refactoring the existing logic in the listener builder to support filtering out listeners that have no backends.
3/ Refactoring the route util logic to not fail on various situations [Invalid Svc, Invalid Ref Grant]
4/ Refactoring the E2E tests to allow for reference grant tests, and to allow more in-depth validations for the Target Groups created.

I have broken down the changes into a couple commits:

reference grant event handler

Adds the event handler to respond to ReferenceGrant changes. The handler will send route events to any routes that are listed in the ReferenceGrant in order to trigger a reconcile. To support route deletions, we calculate routes that exist in the new reference grant OR in the old reference grant.

TODO: This should probably be done conditionally, if the user installed the ReferenceGrant CRD. But this PR is big enough as-is.

refactor build listener to support ref grants

This commit refactors the listener builder to handle listeners with 0 backends. In the case of ALB, this means throwing out any rules that have 0 actions and using the default 404 action. In the case of NLB, this means not provisioning the listener at all.

refactor route utils to support ref grant

This commit refactors the route util to have loading errors (when we load the corresponding service for the backend ref) to be warning or fatal errors. Warning errors will lead to the route being updated with the updated status, but it won't cause the reconcile to fail. To the model builder, it will be like that the route doesn't exist. Fatal errors are produced during k8s api outage and will cause the reconcile cycle to fail.

The warning / fatal logic is used particularly for reference grant revocation, as we need to materialize the new LB state to remove the backend when a reference grant is revoked. I also expanded this logic to non-existent services to make the reconcile logic more robust).

reference grant e2e tests

This commit fixes some bugs found in the ALB path for Reference Grants. This commit introduces e2e test logic for alb ip / instance tests and nlb ip / instance tests.

E2E test results:

Ran 12 of 12 Specs in 6967.095 seconds
SUCCESS! -- 12 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zac-nixon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 16, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 17, 2025
@shuqz
Copy link
Collaborator

shuqz commented Jul 18, 2025

i still have e2e commit to go, but for other 3 commits, left comments in each commit

By("removing ref grant", func() {
err := auxiliaryStack.DeleteReferenceGrants(ctx, tf)
Expect(err).NotTo(HaveOccurred())
// Give some time to have the reference grant to be deleted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait for 2 minutes for deletion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gives time for the ELB CP to propagate to the ELB DP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@@ -120,6 +121,37 @@ func IsHostNameInValidFormat(hostName string) (bool, error) {
return true, nil
}

func attachedRulesAccumulator[RuleType any](ctx context.Context, k8sClient client.Client, route preLoadRouteDescriptor, rules []RuleType, backendRefIterator func(RuleType) []gwv1.BackendRef, ruleConverter func(*RuleType, []Backend) RouteRule) ([]RouteRule, []routeLoadError) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using this function anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, this was my first iteration of the accumulator. will remove

@@ -220,11 +233,17 @@ func (l listenerBuilderImpl) buildListenerRules(stack core.Stack, ls *elbv2model
// TODO: add case for GRPC
}

if len(actions) == 0 {
l.logger.Info("Filling in no backend actions with fixed 503")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some info related to which rule?

}

if backendRef.Weight != nil && *backendRef.Weight == 0 {
return nil, nil
return nil, nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this earlier. This looks like we are skipping the backends with zero weights. However ALB allows the target groups with weight 0 in weighted scenario. Do we wanna still keep these backends with zero weights to match that behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weight 0 means do not forward to this backend in the Gateway API

	// If only one backend is specified and it has a weight greater than 0, 100%
	// of the traffic is forwarded to that backend. If weight is set to 0, no
	// traffic should be forwarded for this entry. If unspecified, weight
	// defaults to 1.

Copy link
Collaborator Author

@zac-nixon zac-nixon Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While diving for that blurb I did realize there is a mismatch between GW API and ALB. GW API allows weights 1-100,000

while ALB only allows up to 999.

I can add that validation in an upcoming PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the validation. But as per our offline discussion, Lets allow these zero weighted backend refs for on-fly traffic switching and avoiding the delay due to tg creation and registration of targets. Also, we will handle edge case where if we find single tgs with zero weight on forward action, we will simply consider as no backend ref found and add 503 fixed response action. This due to the fact that there is mismatch between behavior of gw and ALB for such single tgs with zero weight on forward action. Gateway recommends that no traffic should be sent to such tgs and ALB send 100% traffic.

@@ -32,24 +33,72 @@ type Backend struct {
Weight int
}

type attachedRuleAccumulator[RuleType any] interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great refactor. Pure genius. :-)

@shraddhabang
Copy link
Collaborator

/lgtm
/approved

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 21, 2025
@zac-nixon zac-nixon merged commit 011edcd into kubernetes-sigs:main Jul 21, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants