Make latency impact of decision logs predictable

## What is the underlying problem you're trying to solve?

The current decision log plugin runs in two variants: unbounded (all decisions are kept) and bound (decisions are discarded if the buffer has an overflow). The actual trade-off is between auditability and availability. However, in the first unbounded case, the likelihood of an OOM kill actually grows if the decision log API gets overloaded, still loosing decisions. 

On top, there are quite heavy locks on the the decision log plugin that force for example the encoding of decision in single file. When measuring raw performance of a fleet of OPAs (~50 instances at 30,000 rps) we measured a one order of magnitude higher  p99 latency with vs without decision logs turned on. 

## Describe the ideal solution

If we change the trade-off to auditability vs latency guarantees, a lock-free ring buffer with a fixed size could be used as an alternative to the existing solution. This would limit the used memory in both cases. 
In case auditability is favoured, offered chunks would be tried until it can be put in the buffer (this creates back pressure and increases latency). In case low latency is favoured, offered chunks that cannot be placed in the buffer can be discarded. 
In both cases, this can be achieved without holding any locks. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make latency impact of decision logs predictable #5724

What is the underlying problem you're trying to solve?

Describe the ideal solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make latency impact of decision logs predictable #5724

Description

What is the underlying problem you're trying to solve?

Describe the ideal solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions