-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hi,
It seems that thanos store gateway does not manage well retries in case of HTTP 429 at startup.
I have a case with an S3 load balancer limiting to 1000 query per 10 seconds (100 qps) per IP address and returning HTTP 429 on rate limit error.
If I try to restart store gateway I see on the log of the load balancer 429 errors on HEAD requests to meta.json and the log of the storegateway freeze at line :
caller=fetcher.go:480 level=debug component=block.BaseFetcher msg="fetching meta data" concurrency=32
In this configuration the store gateway never manage to start even after 3 days despite the fact that I have only 3TB of data on s3.
If I change the rate limiting configuration to 4000 query per 10 seconds per IP, it works.
In my view, the store gateway should not be blocked by 429 errors. It should just start more slowly.
Maybe the retries are not well handled at startup ?
I guess that instead of doing retries on each failing query, thanos restart the whole batch of query after some delay ?
Regards