-
Notifications
You must be signed in to change notification settings - Fork 317
Description
I have a VERY OLD cluster in GKE, using a single spot instance and an Ingress. It predates VPC-native LB. At some point in the not-too-distant past it all worked fine. Today it does not work fine.
It seems that whenever that spot node goes down and comes back (same name!) it is removed and not re-added to the (unmanaged) IG, which leaves my Ingress's BackendService with "0 of 0 healthy". I can run gcloud to add the spot node back to the IG and then everything is happy.
But now I have to do that every day or two.
I know I should tear it down and make a new cluster, but it's just a chore I have not had time for. I can 100% imagine how such a bug exists in a sync loop which compares nodes by name (because spot uses the same name).
I don't have time to spelunk thru the controller right now. It's not a critical workload for me, but it SHOULD be up and it USED TO work, so we changed SOMETHING at SOME POINT.