-
Notifications
You must be signed in to change notification settings - Fork 21.5k
Closed
Labels
Description
Something we don't get regarding Summary metric rpc_duration_all and the same metric per method.
Just for easier glance, we created 2 recording rules so we can get percentiles per method by searching specific label:
- expr: label_replace({__name__=~"rpc_duration_.*_success"}, "method", "$1", "__name__", "rpc_duration_(.+)_success")
record: geth_rpc_requests_success
- expr: label_replace({__name__=~"rpc_duration_.*_failure"}, "method", "$1", "__name__", "rpc_duration_(.+)_failure")
record: geth_rpc_requests_failure
So now when we compare P95 for all methods:
rpc_duration_all{pod='node-polygon-0', quantile='0.95'} 316677.89999999973
and P95 for top 5 methods:
topk(5, (sum(rate(geth_rpc_requests_success{quantile='0.95', pod="node-polygon-0"}[5m])) by (pod, method)))
{method="eth_getLogs", pod="node-polygon-0"} | 182.76462962962964
{method="eth_call", pod="node-polygon-0"} | 176.64814814814804
{method="eth_getTransactionReceipt", pod="node-polygon-0"} | 145.22407407407408
{method="eth_gasPrice", pod="node-polygon-0"} | 92.9074074074074
{method="eth_getTransactionByHash", pod="node-polygon-0"} | 20.548703703703705
See the results. The difference is huge. Thats not possible if summaries work correctly, those "top 5 P95 values" should be much much closer to rpc_duration_all.
Can someone explain this behaviour?