Skip to content

rpc: mismatch in rpc-metrics all versus individual #28619

@Aracki

Description

@Aracki

Something we don't get regarding Summary metric rpc_duration_all and the same metric per method.

Just for easier glance, we created 2 recording rules so we can get percentiles per method by searching specific label:

- expr: label_replace({__name__=~"rpc_duration_.*_success"}, "method", "$1", "__name__", "rpc_duration_(.+)_success")
  record: geth_rpc_requests_success
- expr: label_replace({__name__=~"rpc_duration_.*_failure"}, "method", "$1", "__name__", "rpc_duration_(.+)_failure")
  record: geth_rpc_requests_failure

So now when we compare P95 for all methods:

rpc_duration_all{pod='node-polygon-0', quantile='0.95'}    316677.89999999973

and P95 for top 5 methods:

topk(5, (sum(rate(geth_rpc_requests_success{quantile='0.95', pod="node-polygon-0"}[5m])) by (pod, method)))

{method="eth_getLogs", pod="node-polygon-0"} | 182.76462962962964
{method="eth_call", pod="node-polygon-0"} | 176.64814814814804
{method="eth_getTransactionReceipt", pod="node-polygon-0"} | 145.22407407407408
{method="eth_gasPrice", pod="node-polygon-0"} | 92.9074074074074
{method="eth_getTransactionByHash", pod="node-polygon-0"} | 20.548703703703705

See the results. The difference is huge. Thats not possible if summaries work correctly, those "top 5 P95 values" should be much much closer to rpc_duration_all.

Can someone explain this behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions