Skip to content

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint  #2897

@yodaflomaster

Description

@yodaflomaster

Describe the bug
After upgrading version of aws load balancer controller using helm, i keep seeing rest_client_request_latency_seconds histogram metric exposed on the Prometheus metrics endpoint. It includes a URL tag containing the URI of all API versions. It's about ~900 metrics. I've delete chart, check dependencies and redeploy. But the problem didn't go away
kubernetes-sigs/controller-runtime#1423
kubernetes-sigs/controller-runtime#1587

...
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.001"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.002"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.004"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.008"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.016"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.032"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.064"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.128"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.256"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.512"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="+Inf"} 1
rest_client_request_latency_seconds_sum{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET"} 0.010152667
rest_client_request_latency_seconds_count{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.001"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.002"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.004"} 127
...

Steps to reproduce:

Deploy the aws-load-balancer-controller using the Helm Chart with the ServiceMonitor disabled (serviceMonitor.enabled=false Chart value). Get metrics from the exposed Prometheus endpoint (Chart default, :8080/metrics).

Expected outcome:

The rest_client_request_latency_seconds metric either not being present at in the exposed metrics.

Environment:

  • AWS Load Balancer controller: v2.4.5
  • Chart version: 1.4.6
  • EKS: 1.21.14-eks-fb459a0

Additional Context:
Here my chart values file, other values by default.

replicaCount: 2

image:
  repository: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller
  tag: v2.4.5
  pullPolicy: IfNotPresent

clusterName: main-eks-qa

fullnameOverride: aws-load-balancer-controller

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::############:role/aws-load-balancer-controller

podLabels:
  ######.####/instance: aws-load-balancer-controller

webhookTLS:
  caCert:
  cert:
  key:

disableIngressClassAnnotation: true

disableIngressGroupNameAnnotation: true

podDisruptionBudget:
  maxUnavailable: 1

serviceMonitor:
  enabled: false
  additionalLabels: {}
  interval: 1m

clusterSecretsPermissions:
  allowAllSecrets: false

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions