Skip to content

Conversation

dshehbaj
Copy link
Member

@dshehbaj dshehbaj commented Jul 30, 2025

What type of PR is this?
feature

Which issue does this PR fix?:

#3067
#2904

What does this PR do / Why do we need it?:

This PR adds a requested customer feature to allow customers to

  1. Include or exclude subnets/security groups by tagging them kubernetes.io/role/cni=1 or kubernetes.io/role/cni=0 respectively.
  2. Apply custom security groups for secondary ENIs created using kubernetes.io/role/cni=1 tagged subnet if the alternate security group is also tagged kubernetes.io/role/cni=1.
  3. Reserve subnets for specific cluster based on the kubernetes.io/cluster/<my-example-cluster> tag. See Enhanced subnet discovery should use configurable tags #2904 more.

Testing done on this change:

  1. Added unit tests and integration tests.
  2. Manual testing pending.

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades? Has updating a running cluster been tested?:

  • Manual testing pending.

Does this change require updates to the CNI daemonset config files to work?:

  • Requires ENABLE_SUBNET_DISCOVERY enabled which is already done by default.

Does this PR introduce any user-facing change?:

  • Users were already tagging their subnets as the part of the original subnet discovery feature. This enhanced feature will also allow users to tag alternate security groups.

  • Make sure that IPv6 documentation and managed IPv4 policy are updated.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Integration tests passing in IPv4 cluster.

image

Currently adapting our test framework to work with IPv6 cluster. Will paste results here.

@dshehbaj dshehbaj requested a review from a team as a code owner July 30, 2025 16:37
@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch from 96310d6 to 0a64e10 Compare July 30, 2025 20:54
@dshehbaj dshehbaj changed the title pkg awsutils: check for primary/secondary subnet exclusion Enhanced Subnet Discovery Jul 30, 2025
// Mark primary ENI as excluded if primary subnet is excluded
if eni == primaryENI && c.isPrimarySubnetExcluded {
log.Infof("Marking primary ENI %s as excluded from pod IP allocation", eni)
err := c.dataStoreAccess.GetDataStore(eniMetadata.NetworkCard).SetENIExcludedForPodIPs(eni, true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if eniMetadata.NetworkCard is the same network card as 0.

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 6 times, most recently from f1898c7 to 7945b5a Compare August 4, 2025 14:56
Comment on lines +2786 to +2809
name: "IPv6 enabled with subnet discovery and mixed IPv4/IPv6 subnets",
subnets: []ec2types.Subnet{
Copy link
Member Author

@dshehbaj dshehbaj Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check edge case in code if customer is on IPv4 cluster and the code picks up an IPv6 subnet because that's all they had tagged in the console and vice versa.

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 6 times, most recently from 903a226 to 0368ae0 Compare August 4, 2025 16:31
Comment on lines 1239 to 1248
// When subnet discovery is disabled, check if primary subnet is excluded
excluded, checkErr := cache.isPrimarySubnetExcluded()
if checkErr != nil {
// If we can't determine exclusion status, log warning and proceed
log.Warnf("Failed to check if primary subnet is excluded: %v. Proceeding with ENI creation attempt.", checkErr)
} else if excluded {
// Primary subnet is explicitly excluded
err = errors.New("primary subnet is tagged with kubernetes.io/role/cni=0 and subnet discovery is disabled - no valid subnets available for ENI creation")
log.Error(err.Error())
return "", err
}
Copy link
Member Author

@dshehbaj dshehbaj Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks if Cx has a tagged subnet, but have not enable subnet discovery. Do we want to do this check?

Comment on lines 465 to 472
if c.enableIPv6 {
assignedIPs = primaryENIInfo.AssignedIPv6Addresses()
ipType = "IPv6"
} else {
assignedIPs = primaryENIInfo.AssignedIPv4Addresses()
ipType = "IPv4"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we can use hasPods() function.

Currently hasPods() only checks for IPv4 assigned addresses, will need to update that func to support IPv6.

Comment on lines +1066 to +1119
// Skip ENIs that are excluded from pod IP allocation
if eni.IsExcludedForPodIPs {
ds.log.Debugf("Skip needs IP check for ENI %s as it is excluded from pod IP allocation", eni.ID)
continue
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will skip the primary ENI in GetAllocatableENIs function and prevent it from getting IPs assigned/allocated

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch from 0368ae0 to ea0763f Compare August 6, 2025 17:47
@dshehbaj dshehbaj requested a review from haouc August 6, 2025 18:15
assert.False(t, wasUsed, "Primary ENI should be excluded and auto-cleanup prefixes")
}

func TestPrimaryENIAutoCleanupFailureHandling(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add case for IPv6 mode too

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 3 times, most recently from 587b9f3 to ea0763f Compare August 8, 2025 18:11
describeSGInput := &ec2.DescribeSecurityGroupsInput{
Filters: []ec2types.Filter{
{
Name: aws.String("vpc-id"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need support global SGs which could be in another VPCs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we support VPC peering in CNI. I could not find any existing logic that I could leverage.

I think keeping it local VPC only might keep things simpler, and we could do VPC peering as a follow up.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global SGs is not VPC peering. EC2 support using SG from another VPC.

},
{
Name: aws.String("tag:" + subnetDiscoveryTagKey),
Values: []string{"1"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's make them to const, such as selected = 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

eniIDs = append(eniIDs, cache.getFilteredENIs(ctx, ds, true)...) // only secondary subnet ENIs
}

cache.applySecurityGroupsToENIs(ctx, eniIDs, primarySGs, "Applying primary security groups to")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have error to handle.

return err
}

addedCount, deletedCount := cache.detectSecurityGroupChanges(sgIDs, &cache.securityGroups, "primary")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same arbitary argument being passed through.

primarySubnetENIs := cache.getFilteredENIs(ctx, ds, false)
for _, eniID := range primarySubnetENIs {
// Filter out unmanaged ENIs
if !cache.unmanagedENIs.Has(eniID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel unmanaged ENIs should be handled in cache.getFilteredENIs().

}

// Apply security groups to the filtered ENIs
cache.applySecurityGroupsToENIs(ctx, eniIDs, sgIDs, "Update")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

log.Warnf("Failed to check if primary subnet is excluded: %v. Proceeding with ENI creation attempt.", checkErr)
} else if excluded {
// Primary subnet is explicitly excluded
return "", fmt.Errorf("primary subnet is tagged with kubernetes.io/role/cni=0 - no valid subnets available for ENI creation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest surface this error to customers.

if vpcErr != nil {
log.Warnf("Failed to call ec2:DescribeSubnets: %v", vpcErr)
log.Info("Defaulting to same subnet as the primary interface for the new ENI")

// Even in fallback, check if primary subnet is excluded
excluded, checkErr := cache.IsPrimarySubnetExcluded(ctx)
excluded, checkErr := cache.IsSubnetExcluded(ctx, cache.subnetID)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use primarySubnetID = cache.subnetID

@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 5 times, most recently from 9036276 to a188827 Compare August 18, 2025 16:30
@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch 2 times, most recently from f0bc83a to 1d749c9 Compare August 21, 2025 02:11
@dshehbaj dshehbaj force-pushed the shehbaj/enhanced-subnet-discovery-rebase branch from 8baff5a to 34be861 Compare August 21, 2025 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants