Skip to content

store gateway cannot start if it hits rate limit on s3 #8328

@julienlau

Description

@julienlau

Hi,

It seems that thanos store gateway does not manage well retries in case of HTTP 429 at startup.

I have a case with an S3 load balancer limiting to 1000 query per 10 seconds (100 qps) per IP address and returning HTTP 429 on rate limit error.

If I try to restart store gateway I see on the log of the load balancer 429 errors on HEAD requests to meta.json and the log of the storegateway freeze at line :
caller=fetcher.go:480 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=32
In this configuration the store gateway never manage to start even after 3 days despite the fact that I have only 3TB of data on s3.

If I change the rate limiting configuration to 4000 query per 10 seconds per IP, it works.

In my view, the store gateway should not be blocked by 429 errors. It should just start more slowly.
Maybe the retries are not well handled at startup ?
I guess that instead of doing retries on each failing query, thanos restart the whole batch of query after some delay ?

Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions