thanos_compact_halted metric never cleared #8440

johngmyers · 2025-04-25T22:52:55Z

johngmyers
Apr 25, 2025

Thanos, Prometheus and Golang version used:

Thanos v0.37.0
Prometheus v3.1.0

Object Storage Provider:

S3

What happened:

Thanos-compact threw a transient error "preallocate: no space left on device". Although that cleanup loop failed, subsequent cleanup loops completed without error.

The thanos_compact_halted metric was set to 1 and remained so, despite compact making progress.

On-call personnel had to manually restart the pod in order to clear the resulting alert.

What you expected to happen:

The metric should have been set back to 0 once compact completed a loop without error, so we don't have to have on-call personnel waste time manually restarting the pod.

How to reproduce it (as minimally and precisely as possible):

Run Thanos in production.

Full logs to relevant components:

Anything else we need to know:

@johngmyers · 2025-04-25T22:54:27Z

dosubot[bot]
bot Apr 25, 2025

Hello @johngmyers! 👋 I'm Dosu, a bot here to help you solve bugs, answer questions, and assist you in becoming a contributor. I'm here while you wait for a human maintainer. How can I assist you today?

The thanos_compact_halted metric remains set to 1 even after Thanos-compact completes a loop without error because the Compactor is designed to halt on certain data failures without crashing. This behavior is controlled by the hidden flag --no-debug.halt-on-error. When a halt error occurs, the Compactor does not retry compaction or computations, and the metric thanos_compact_halted is set to 1 to indicate the halt state. Manual intervention is required to clear the alert because the Compactor does not automatically resolve the halt condition ^[1]^[2].

To minimize manual intervention, you can consider using the --no-debug.halt-on-error flag to control the behavior of the Compactor on halt errors. If set, the Compactor will exit on a halt error, allowing the process to restart and potentially resolve the issue automatically ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

thanos_compact_halted metric never cleared #8440

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

thanos_compact_halted metric never cleared #8440

Uh oh!

johngmyers Apr 25, 2025

Replies: 1 comment

Uh oh!

dosubot[bot] bot Apr 25, 2025

johngmyers
Apr 25, 2025

dosubot[bot]
bot Apr 25, 2025