thanos_compact_halted metric never cleared #8440
Replies: 1 comment
-
|
Hello @johngmyers! 👋 I'm Dosu, a bot here to help you solve bugs, answer questions, and assist you in becoming a contributor. I'm here while you wait for a human maintainer. How can I assist you today? The To minimize manual intervention, you can consider using the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thanos, Prometheus and Golang version used:
Thanos v0.37.0
Prometheus v3.1.0
Object Storage Provider:
S3
What happened:
Thanos-compact threw a transient error "preallocate: no space left on device". Although that cleanup loop failed, subsequent cleanup loops completed without error.
The
thanos_compact_haltedmetric was set to1and remained so, despite compact making progress.On-call personnel had to manually restart the pod in order to clear the resulting alert.
What you expected to happen:
The metric should have been set back to
0once compact completed a loop without error, so we don't have to have on-call personnel waste time manually restarting the pod.How to reproduce it (as minimally and precisely as possible):
Run Thanos in production.
Full logs to relevant components:
Anything else we need to know:
Beta Was this translation helpful? Give feedback.
All reactions