Skip to content

[Feature Request]: Observability - Pre-built Grafana Dashboards & Loki Log Export #272

@crivetimihai

Description

@crivetimihai

🧭 Epic

Title: Grafana Dashboards & Loki Log Export


Goal

Ship a turn-key observability bundle for MCP Gateway consisting of:

  1. Pre-built Grafana dashboards (JSON) covering request latency, error rates, policy-deny counts, CPU/memory, and per-tool call volume.
  2. Loki log export pipeline (Promtail config + sample docker-compose.yml) that pushes structured gateway logs to Loki, with Grafana log panels pre-wired.
  3. Helm-chart options to enable the stack out-of-the-box (helm install gateway --set observability.enabled=true).

Milestone: Release 0.5.0Enterprise Operability & Observability

Repository impact: charts/mcpgateway/, observability/ folder for JSON and sample compose.


🧭 Type of Feature

  • Observability / dashboards
  • Developer & operator tooling

🛠 Deliverables

Artifact Path Notes
dashboards/mcpgateway_core.json Grafana JSON Latency (p50/p95), req/sec, status code mix
dashboards/mcpgateway_per_tool.json Grafana JSON Calls, error %, avg duration grouped by tool.name label
dashboards/mcpgateway_k8s.json Grafana JSON CPU, memory, restarts (via kube-state-metrics)
loki/promtail.yaml Promtail config Multiline JSON parser for gateway logs
loki/docker-compose.yml Stand-alone compose with Loki + Grafana + Promtail
charts/mcpgateway/values.yaml new section observability.* dashboards.enabled, loki.enabled, retentionDays
Docs docs/observability/grafana.md Import steps, screenshots, alert examples

🙋‍♂️ User Stories & Acceptance Criteria

Story 1 — One-Command Local Stack

Scenario: Spin up full stack via docker-compose
Given I clone observability folder
When I run "docker compose up -d"
Then Grafana UI on localhost:3000 shows dashboard "MCP Gateway • Core"
And panels populate after 10 seconds of traffic

Story 2 — Helm Chart Auto-Import

Scenario: Dashboards auto-load in cluster
When I helm install gateway charts/mcpgateway --set observability.enabled=true
Then a ConfigMap "gateway-grafana-dashboards" contains JSON dashboards
And Grafana side-car picks them up within 1 minute

Story 3 — Loki Log Query Panel

Scenario: Query deny logs
Given gateway emits log {"level":"warn","msg":"policy_deny","tool":"db.backup"}
When I open Grafana Explore and run {app=\"gateway\"} |= \"policy_deny\"
Then results include the deny entry within 10 s

Story 4 — Alert Rule Example

Scenario: High 5xx alert fires
Given avg rate of status=5xx > 1 rps for 5 minutes
Then Alertmanager (optional) sends "High Error Rate" alert

📐 Architecture Sketch (Mermaid)

flowchart TD
    subgraph Cluster
        Gateway((MCP Gateway))
        Promtail[[Promtail sidecar]]
        Loki[(Loki)]
        Grafana[(Grafana)]
        DashboardsCM[[Dashboards ConfigMap]]

        Gateway --> Promtail
        Promtail --> Loki
        Grafana --> Loki
        DashboardsCM --> Grafana
    end
Loading

📂 Component Matrix

Component / Path Purpose
observability/dashboards/*.json Pre-built Grafana dashboards
observability/loki/promtail.yaml Promtail pipeline (k8s & docker)
observability/loki/docker-compose.yml Quick-start stack
charts/mcpgateway/templates/grafana-dashboards.yaml ConfigMap embedding dashboards
charts/mcpgateway/values.yaml observability.enabled, observability.loki.enabled, observability.retentionDays
docs/observability/grafana.md Setup guide, screenshots, sample alerts

📋 Global Acceptance Checklist

  • Local docker compose brings up Gateway + Loki + Grafana; dashboards auto-populate.
  • Helm install with observability.enabled=true loads dashboards via side-car.
  • Promtail parses multiline JSON logs; labels level, tool, status.
  • Example alert rule (High 5xx) included and documented.
  • Dashboards pass grafana-dashboard-validator CI step.
  • CI workflow publishes dashboard JSONs as release asset.

🔄 Roll-Out Plan

  1. Create observability/ directory with dashboards and compose stack.
  2. Build Promtail pipeline (JSON parser + label mapping).
  3. Author three dashboards in Grafana 10, export JSON.
  4. Add Helm chart fields & ConfigMap template; wire Grafana side-car annotation.
  5. Write docs with screenshots & Loki query snippets.
  6. Add GitHub Action dashboard_test.yml (run json-lint + validate UID uniqueness).
  7. QA on Minikube; update README quick-start.

Metadata

Metadata

Assignees

Labels

devopsDevOps activities (containers, automation, deployment, makefiles, etc)enhancementNew feature or requestobservabilityObservability, logging, monitoringtriageIssues / Features awaiting triage

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions