Skip to content

Conversation

M00nF1sh
Copy link
Collaborator

@M00nF1sh M00nF1sh commented Mar 26, 2025

This PR refactor and enhances the subnet discovery module as follows

  1. Added a fallback mechanism to find subnets in VPC based on reachability(public/subnet) that matches ELB's requirement.

    • this provides a way to remove the requirement of "kubernetes.io/role" tag on subnets by default.
    • this fallback mechanism requires new IAM permission (ec2:DescribeRouteTables) to work.
    • this fallback mechanism can be turned off by SubnetDiscoveryByReachability feature flag. We have enabled this featureFlag by default. It shall have no impact to customers since this fallback only invokes when no subnets with role tag found. (even cx forgets to add the new IAM permission during upgrade, subnet discovery will work just as before)
  2. Refactor the subnetResolver to be more clear and have consistent behavior in multiple scenarios.

    • moved the "minimalIPAddressCount"/"albSingleSubnet"/"SubnetClusterTagCheck" from resolveOptions into fields of subnetResolver.
      • Those are per cluster/controller settings, instead of need to be set per ALB/NLB
      • Currently, some restrictions are not enforced in certain scenarios and caused inconsistency. e.g. when DiscoveryViaSelector, the minimalIPAddressCount is not validated against subnets.
    • refactor the logic within subnetResolver to be clear and split logic into smaller reusable chunks.
    • made the validations consistent for resolveViaSelector(ID) to resolveViaNameOrIDSlice. (basically this TODO)
  3. Rendered doc:
    doc

Issue

Description

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: M00nF1sh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 26, 2025
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 26, 2025
@M00nF1sh M00nF1sh force-pushed the subnet-reachability branch from 0bb20f7 to 598e0e6 Compare March 27, 2025 22:10
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 27, 2025
@M00nF1sh M00nF1sh force-pushed the subnet-reachability branch 3 times, most recently from 3bff4f6 to 1140a0d Compare March 28, 2025 00:31
@M00nF1sh M00nF1sh changed the title Enhance subnet discovery by fallback to find subnets in VPC whose reachability(public/subnet). Enhance subnet discovery Mar 28, 2025
@M00nF1sh M00nF1sh force-pushed the subnet-reachability branch from 1140a0d to 707ad82 Compare March 28, 2025 00:41
@M00nF1sh M00nF1sh force-pushed the subnet-reachability branch from 707ad82 to 2f6ce5c Compare March 28, 2025 00:45
@M00nF1sh M00nF1sh requested a review from zac-nixon March 28, 2025 00:45
// - The subnet has a Kubernetes cluster tag matching the current cluster
clusterTagCheckEnabled bool
// whether to enable a single subnet as ALB subnet
// by default ALB requires two subent, only whitelisted users can use a single subnet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit whitelisted -> allowlisted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

req := &ec2sdk.DescribeSubnetsInput{
Filters: []ec2types.Filter{
{
Name: awssdk.String("vpc-id"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you re-define this query names multiple times. maybe use a constant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

if err != nil {
return nil, err
}
resolvedSubnets = append(resolvedSubnets, subnets...)
}
if len(resolvedSubnets) != len(subnetNameOrIDs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this check? in the other function that queries for subnets by ID, you actually added this exact check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's due to i moved this check into the listSubnetsByIDs and listSubnetsByNames
The length must equal when listSubnetsByIDs and listSubnetsByNames didn't return error

chosenSubnets = append(chosenSubnets, subnets[0])
}
}
sortSubnetsByID(chosenSubnets)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this sort mess up the EIP allocation feature? #3925

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't
This sort only triggers for "Discovery" via tag selector or role tag or reachability.
When subnet explicitly specified, no sort will occur

@zac-nixon
Copy link
Collaborator

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025
@wweiwei-li
Copy link
Collaborator

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 539245d into kubernetes-sigs:main Mar 28, 2025
9 checks passed
k8s-ci-robot added a commit that referenced this pull request Mar 31, 2025
fix bug(introduced by #4109) in subnet resolver
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants