Skip to content

After migration to the cluster 1.30.14 in china , new nodes could not join the cluster anymore... #17558

@nuved

Description

@nuved

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

Client version: 1.30.4 (git-v1.30.4)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.30.14

3. What cloud provider are you using?

AWS - china cluster

4. What commands did you run? What is the simplest way to reproduce this issue?

After migrate the cluster from 1.29.13 to 1.30.14 with using kops 1.30.4 , I see kops-controlers are trying to get config files from a bucket with defaulting region to "us-east-1" that is wrong and it may explain why it gets error as bucket is defined in china and does not exist in region us....

5. What happened after the commands executed?
After replacing controllers with 1.30.14 , new workers nodes are not able to join the cluster , they stuck with this error ....
W0814 09:09:37.859909 1528 main.go:133] got error running nodeup (will retry in 30s): failed to get node config from server: kops-controller returned status code 400: failed to build node config

I also see these errors in kops-controllers pods,

I0814 09:16:15.920109 1 s3context.go:192] defaulting region to "us-east-1" I0814 09:16:16.741856 1 s3context.go:209] unable to get bucket location from region "us-east-1"; scanning all regions: operation error S3: GetBucketLocation, https response error StatusCode: 400, RequestID: hjhj, HostID: hbgghgjgjh=, api error InvalidToken: The provided token is malformed or otherwise invalid.

I also checked configmap of kops-controller and I see there, region is correctly set to cn-north-1 , not sure why logs shows default regions is set to us-east-1 ?

6. What did you expect to happen?

I Would expect new nodes could join the cluster and nodeup could get the proper config from the kops-controller .

Do you think if I can try using kops 1.31.0 to see if the problem may not be there?

I also tried to use 1.32.0 and 1.32.1 but both got failed when running this command! get assets --copy

`I0814 09:28:06.841955 68638 executor.go:113] Tasks: 0 done / 167 total; 72 can run
I0814 09:28:07.623661 68638 executor.go:113] Tasks: 72 done / 167 total; 24 can run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x503fb73]

goroutine 612 [running]:
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*SecurityGroup).FindDeletions(0xc00120ecc0, 0xc00162f2c0)
k8s.io/kops/upup/pkg/fi/cloudup/awstasks/securitygroup.go:339 +0x293
k8s.io/kops/upup/pkg/fi.defaultDeltaRunMethod[...]({0x86e2dc0, 0xc00120ecc0}, 0xc00162f2c0)
k8s.io/kops/upup/pkg/fi/default_methods.go:109 +0x5ba
k8s.io/kops/upup/pkg/fi.CloudupDefaultDeltaRunMethod(...)
k8s.io/kops/upup/pkg/fi/default_methods.go:42
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*SecurityGroup).Run(0xc0013aa840?, 0x36?)
k8s.io/kops/upup/pkg/fi/cloudup/awstasks/securitygroup.go:137 +0x27
k8s.io/kops/upup/pkg/fi.(*executor[...]).forkJoin.func1(0xf)
k8s.io/kops/upup/pkg/fi/executor.go:223 +0x3d9
created by k8s.io/kops/upup/pkg/fi.(*executor[...]).forkJoin in goroutine 1
k8s.io/kops/upup/pkg/fi/executor.go:204 +0xa7`

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions