Recent bug somewhere in instance group management.

I have a VERY OLD cluster in GKE, using a single spot instance and an Ingress.  It predates VPC-native LB.  At some point in the not-too-distant past it all worked fine.  Today it does not work fine.

It seems that whenever that spot node goes down and comes back (same name!) it is removed and not re-added to the (unmanaged) IG, which leaves my Ingress's BackendService with "0 of 0 healthy".  I can run gcloud to add the spot node back to the IG and then everything is happy.  

But now I have to do that every day or two.  

I know I should tear it down and make a new cluster, but it's just a chore I have not had time for.  I can 100% imagine how such a bug exists in a sync loop which compares nodes by name (because spot uses the same name).

I don't have time to spelunk thru the controller right now.  It's not a critical workload for me, but it SHOULD be up and it USED TO work, so we changed SOMETHING at SOME POINT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recent bug somewhere in instance group management. #2920

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recent bug somewhere in instance group management. #2920

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions