Skip to content

Recent bug somewhere in instance group management. #2920

@thockin

Description

@thockin

I have a VERY OLD cluster in GKE, using a single spot instance and an Ingress. It predates VPC-native LB. At some point in the not-too-distant past it all worked fine. Today it does not work fine.

It seems that whenever that spot node goes down and comes back (same name!) it is removed and not re-added to the (unmanaged) IG, which leaves my Ingress's BackendService with "0 of 0 healthy". I can run gcloud to add the spot node back to the IG and then everything is happy.

But now I have to do that every day or two.

I know I should tear it down and make a new cluster, but it's just a chore I have not had time for. I can 100% imagine how such a bug exists in a sync loop which compares nodes by name (because spot uses the same name).

I don't have time to spelunk thru the controller right now. It's not a critical workload for me, but it SHOULD be up and it USED TO work, so we changed SOMETHING at SOME POINT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions