Skip to content

Bug: max inflight push requests counter is sometimes not decremented #12966

@colega

Description

@colega

What is the bug?

A change in the grpc library caused a change in TapHandle behaviour making this statement in dskit false:

// If we accept request (no error), eventually HandleRPC with stats.End notification will be called.

If for some reason the push request is slow enough for its context to be expired already after checking the TapHandle, the early-exit introduced in grpc/grpc-go#8439 will cause StatsHandler not be executed, and the inflight push requests not be decremented.

If this happens enough times, the counter can be left at the limit, making the ingester reject all write requests while staying healthy.

How to reproduce it?

Send push requests with a short deadline to an ingester pod that is struggling to process the load.

What did you think would happen?

N/A

What was your environment?

This happens in Mimir since r361

Any additional context to share?

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions