-
Notifications
You must be signed in to change notification settings - Fork 261
Open
Labels
choreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance chorescicdIssue with CI/CD process (GitHub Actions, scaffolding)Issue with CI/CD process (GitHub Actions, scaffolding)devopsDevOps activities (containers, automation, deployment, makefiles, etc)DevOps activities (containers, automation, deployment, makefiles, etc)performancePerformance related itemsPerformance related itemstriageIssues / Features awaiting triageIssues / Features awaiting triage
Milestone
Description
🎯 Performance Optimization Strategy
Goal: Implement compression, HTTP/2, and FastAPI optimizations using nginx reverse proxy with optimized Gunicorn workers.
📋 Complete Implementation Plan
1. FastAPI Application Optimizations
Update mcpgateway/main.py:
# mcpgateway/main.py
from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
import orjson
from brotli_asgi import BrotliMiddleware
from starlette.middleware.gzip import GZipMiddleware
from prometheus_fastapi_instrumentator import Instrumentator
import logging
# Create app with optimized JSON response
app = FastAPI(
title="MCP Gateway",
default_response_class=ORJSONResponse, # 2-4x faster JSON
docs_url="/docs",
redoc_url="/redoc"
)
# Configure compression middleware (order matters!)
# 1. Brotli first (better compression for modern browsers)
app.add_middleware(
BrotliMiddleware,
quality=4, # 1-11, 4 is balanced
mode="text", # Optimize for JSON/text
lgwin=22, # Window size
minimum_size=1024 # Skip small responses
)
# 2. GZip fallback for older clients
app.add_middleware(
GZipMiddleware,
minimum_size=1024, # Only compress >1KB
compresslevel=6 # Balanced speed/ratio
)
# 3. Prometheus metrics
instrumentator = Instrumentator(
should_group_status_codes=True,
should_ignore_untemplated=True,
should_respect_env_var=True,
should_instrument_requests_inprogress=True,
excluded_handlers=[".*admin.*", "/metrics"],
env_var_name="ENABLE_METRICS",
inprogress_name="mcp_requests_inprogress",
inprogress_labels=True,
)
instrumentator.instrument(app).expose(app)
# Add custom headers for debugging
@app.middleware("http")
async def add_performance_headers(request: Request, call_next):
response = await call_next(request)
response.headers["X-Process-Time"] = str(time.time() - start_time)
response.headers["X-Server"] = "MCP-Gateway"
return response
Create optimized response models:
# mcpgateway/responses.py
from typing import Any, Optional
import orjson
from fastapi.responses import Response
class ORJSONResponse(Response):
"""Fast JSON response using orjson."""
media_type = "application/json"
def render(self, content: Any) -> bytes:
return orjson.dumps(
content,
option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY
)
class CompactORJSONResponse(ORJSONResponse):
"""Compact JSON (no indentation) for production."""
def render(self, content: Any) -> bytes:
return orjson.dumps(
content,
option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY
)
class PrettyORJSONResponse(ORJSONResponse):
"""Pretty JSON for development."""
def render(self, content: Any) -> bytes:
return orjson.dumps(
content,
option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_INDENT_2
)
2. Gunicorn Configuration
Create optimized gunicorn.conf.py:
# gunicorn.conf.py
import multiprocessing
import os
# Server socket
bind = f"0.0.0.0:{os.getenv('PORT', '4444')}"
backlog = 2048
# Worker processes
workers = os.getenv('WORKERS', multiprocessing.cpu_count() * 2 + 1)
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
max_requests = 1000
max_requests_jitter = 50
timeout = 30
keepalive = 5
# Restart workers gracefully
graceful_timeout = 30
reload = os.getenv('ENV') == 'development'
# Logging
accesslog = "-"
errorlog = "-"
loglevel = os.getenv('LOG_LEVEL', 'info')
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'
# Process naming
proc_name = 'mcp-gateway'
# Server mechanics
daemon = False
pidfile = '/tmp/mcp-gateway.pid'
worker_tmp_dir = '/dev/shm'
user = None
group = None
tmp_upload_dir = None
# SSL (if needed)
if os.getenv('SSL') == 'true':
keyfile = os.getenv('KEY_FILE', 'certs/key.pem')
certfile = os.getenv('CERT_FILE', 'certs/cert.pem')
# StatsD integration (optional)
if os.getenv('STATSD_HOST'):
statsd_host = os.getenv('STATSD_HOST')
statsd_prefix = 'mcp.gateway'
# Preload app for memory efficiency
preload_app = True
# Enable thread support for better async
threads = 4
def pre_fork(server, worker):
"""Called just before a worker is forked."""
server.log.info(f"Worker spawned (pid: {worker.pid})")
def post_fork(server, worker):
"""Called just after a worker is forked."""
server.log.info(f"Worker initialized (pid: {worker.pid})")
def worker_int(worker):
"""Called when worker receives INT or QUIT signal."""
worker.log.info(f"Worker interrupted (pid: {worker.pid})")
def pre_exec(server):
"""Called just before new master process is forked."""
server.log.info("Forking new master process")
Update run-gunicorn.sh:
#!/bin/bash
# run-gunicorn.sh - Optimized production server
set -euo pipefail
# Load environment
source .env 2>/dev/null || true
# Set defaults
export WORKERS=${WORKERS:-$(nproc --all)}
export PORT=${PORT:-4444}
export LOG_LEVEL=${LOG_LEVEL:-info}
# Install optimized dependencies if missing
if ! python -c "import uvloop" 2>/dev/null; then
echo "📦 Installing performance dependencies..."
pip install -q "uvicorn[standard]" orjson brotli-asgi prometheus-fastapi-instrumentator
fi
# Use optimized Python flags
export PYTHONUNBUFFERED=1
export PYTHONOPTIMIZE=1
export PYTHONDONTWRITEBYTECODE=1
echo "🚀 Starting MCP Gateway (optimized)"
echo " Workers: $WORKERS"
echo " Port: $PORT"
echo " PID: $$"
# Start with configuration file
exec gunicorn mcpgateway.main:app \
--config gunicorn.conf.py \
--worker-class uvicorn.workers.UvicornWorker \
--loop uvloop \
--http httptools
3. Nginx Reverse Proxy Configuration
Create nginx/nginx.conf:
# nginx.conf - High-performance reverse proxy
user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logging
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'rt=$request_time uct="$upstream_connect_time" '
'uht="$upstream_header_time" urt="$upstream_response_time"';
access_log /var/log/nginx/access.log main buffer=16k;
# Performance optimizations
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 100;
reset_timedout_connection on;
client_body_timeout 10;
client_header_timeout 10;
send_timeout 10;
# Compression (at nginx level for static content)
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript
application/json application/javascript application/xml+rss
application/rss+xml application/atom+xml image/svg+xml;
gzip_min_length 1000;
gzip_disable "msie6";
# Brotli (if nginx has module)
# brotli on;
# brotli_comp_level 6;
# brotli_types text/plain text/css text/xml text/javascript
# application/json application/javascript application/xml+rss;
# Buffer sizes
client_body_buffer_size 128k;
client_max_body_size 10m;
client_header_buffer_size 1k;
large_client_header_buffers 4 16k;
output_buffers 1 32k;
postpone_output 1460;
# Proxy cache
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=mcp_cache:10m
max_size=1g inactive=60m use_temp_path=off;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
limit_req_zone $binary_remote_addr zone=health_limit:1m rate=10r/s;
# Upstream configuration
upstream mcp_gateway {
least_conn;
# Multiple backend servers if needed
server 127.0.0.1:4444 max_fails=3 fail_timeout=30s;
# server 127.0.0.1:4445 max_fails=3 fail_timeout=30s backup;
# Connection pooling
keepalive 32;
keepalive_requests 100;
keepalive_timeout 60s;
}
# HTTPS server with HTTP/2
server {
listen 443 ssl http2;
server_name mcp-gateway.example.com;
# SSL configuration
ssl_certificate /etc/nginx/certs/cert.pem;
ssl_certificate_key /etc/nginx/certs/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_session_tickets off;
ssl_stapling on;
ssl_stapling_verify on;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "no-referrer-when-downgrade" always;
add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# API endpoints
location /api/ {
# Rate limiting
limit_req zone=api_limit burst=50 nodelay;
proxy_pass http://mcp_gateway;
proxy_http_version 1.1;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Request-ID $request_id;
# Connection reuse
proxy_set_header Connection "";
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffering
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
# Cache headers from upstream
proxy_cache mcp_cache;
proxy_cache_valid 200 1m;
proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
proxy_cache_background_update on;
proxy_cache_lock on;
# Don't cache POST/PUT/DELETE
proxy_cache_methods GET HEAD;
}
# SSE endpoints (no buffering)
location /api/v1/sse {
proxy_pass http://mcp_gateway;
proxy_http_version 1.1;
# SSE specific
proxy_set_header Connection "";
proxy_set_header Cache-Control "no-cache";
proxy_set_header X-Accel-Buffering "no";
# Disable buffering for SSE
proxy_buffering off;
proxy_cache off;
# Longer timeout for SSE
proxy_read_timeout 3600s;
# Standard headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Health checks (lighter rate limit)
location /health {
limit_req zone=health_limit burst=5;
proxy_pass http://mcp_gateway;
proxy_http_version 1.1;
proxy_set_header Connection "";
# Cache health checks
proxy_cache mcp_cache;
proxy_cache_valid 200 5s;
}
# Metrics endpoint (internal only)
location /metrics {
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
proxy_pass http://mcp_gateway;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
# Static files (if any)
location /static/ {
alias /var/www/mcp-gateway/static/;
expires 1y;
add_header Cache-Control "public, immutable";
# Enable gzip for static files
gzip_static on;
}
}
# HTTP to HTTPS redirect
server {
listen 80;
server_name mcp-gateway.example.com;
return 301 https://$server_name$request_uri;
}
}
4. Docker Compose Setup
Create docker-compose.prod.yml:
version: '3.8'
services:
nginx:
image: nginx:alpine
container_name: mcp-nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./certs:/etc/nginx/certs:ro
- nginx-cache:/var/cache/nginx
depends_on:
- gateway
restart: always
networks:
- mcp-network
gateway:
build:
context: .
dockerfile: Containerfile
container_name: mcp-gateway
environment:
- WORKERS=4
- PORT=4444
- ENABLE_METRICS=true
volumes:
- ./logs:/app/logs
restart: always
networks:
- mcp-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4444/health"]
interval: 30s
timeout: 10s
retries: 3
prometheus:
image: prom/prometheus:latest
container_name: mcp-prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- mcp-network
networks:
mcp-network:
driver: bridge
volumes:
nginx-cache:
prometheus-data:
5. Benchmarking Suite
Create benchmark/run-benchmarks.sh:
#!/bin/bash
# Comprehensive performance benchmark suite
set -euo pipefail
# Configuration
BASE_URL="${BASE_URL:-http://localhost}"
WARMUP_REQUESTS=1000
BENCHMARK_DURATION=30
CONNECTIONS=100
THREADS=4
# Colors
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m'
echo -e "${GREEN}🚀 MCP Gateway Performance Benchmark${NC}"
echo "=================================="
date
# Function to check if server is ready
wait_for_server() {
echo -n "Waiting for server to be ready..."
for i in {1..30}; do
if curl -s "$BASE_URL/health" > /dev/null; then
echo -e " ${GREEN}Ready!${NC}"
return 0
fi
echo -n "."
sleep 1
done
echo -e " ${RED}Timeout!${NC}"
return 1
}
# Function to run wrk benchmark
run_wrk_test() {
local endpoint=$1
local name=$2
echo -e "\n${YELLOW}📊 Testing: $name${NC}"
echo "Endpoint: $endpoint"
wrk -t$THREADS -c$CONNECTIONS -d${BENCHMARK_DURATION}s \
--latency \
-H "Accept-Encoding: gzip, br" \
"$endpoint"
}
# Function to test compression
test_compression() {
echo -e "\n${YELLOW}📦 Compression Test${NC}"
# Test uncompressed
size_raw=$(curl -s -o /dev/null -w "%{size_download}" "$BASE_URL/api/v1/config")
echo "Uncompressed size: $size_raw bytes"
# Test gzip
size_gzip=$(curl -s -H "Accept-Encoding: gzip" -o /dev/null -w "%{size_download}" "$BASE_URL/api/v1/config")
echo "Gzip compressed: $size_gzip bytes"
# Test brotli
size_br=$(curl -s -H "Accept-Encoding: br" -o /dev/null -w "%{size_download}" "$BASE_URL/api/v1/config")
echo "Brotli compressed: $size_br bytes"
# Calculate savings
if [ $size_raw -gt 0 ]; then
gzip_saving=$(( 100 - (size_gzip * 100 / size_raw) ))
br_saving=$(( 100 - (size_br * 100 / size_raw) ))
echo -e "${GREEN}Gzip savings: ${gzip_saving}%${NC}"
echo -e "${GREEN}Brotli savings: ${br_saving}%${NC}"
fi
}
# Function to test concurrent connections
test_concurrency() {
echo -e "\n${YELLOW}🔄 Concurrency Test${NC}"
for conns in 10 50 100 200 500; do
echo -n "Testing with $conns concurrent connections: "
result=$(wrk -t4 -c$conns -d10s --timeout 10s "$BASE_URL/health" 2>&1 | grep "Requests/sec" | awk '{print $2}')
echo "$result req/s"
done
}
# Main benchmark flow
main() {
# Check prerequisites
command -v wrk >/dev/null 2>&1 || { echo "Error: wrk not installed"; exit 1; }
command -v curl >/dev/null 2>&1 || { echo "Error: curl not installed"; exit 1; }
# Wait for server
wait_for_server || exit 1
# Warm up
echo -e "\n${YELLOW}♨️ Warming up server...${NC}"
wrk -t2 -c10 -d5s "$BASE_URL/health" > /dev/null 2>&1
# Run benchmarks
run_wrk_test "$BASE_URL/health" "Health Check Endpoint"
run_wrk_test "$BASE_URL/version" "Version Endpoint"
run_wrk_test "$BASE_URL/api/v1/config" "Config API Endpoint"
# Test compression
test_compression
# Test concurrency scaling
test_concurrency
# Memory test
echo -e "\n${YELLOW}💾 Memory Usage${NC}"
if command -v docker >/dev/null 2>&1; then
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep -E "(NAME|mcp-)"
fi
echo -e "\n${GREEN}✅ Benchmark complete!${NC}"
}
# Run with error handling
main "$@"
6. Monitoring Configuration
Create prometheus.yml:
# Prometheus configuration
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'mcp-gateway'
static_configs:
- targets: ['gateway:4444']
metrics_path: '/metrics'
- job_name: 'nginx'
static_configs:
- targets: ['nginx:9113']
📊 Performance Tuning Checklist
Application Level:
- ORJSON for fast JSON serialization
- Brotli + GZip compression middleware
- Prometheus metrics integration
- Connection pooling for DB/Redis
- Async all the way down
Server Level:
- Uvloop event loop
- HTTPTools parser
- Optimized worker count
- Shared memory for temp files
- Preloaded application
Proxy Level:
- HTTP/2 support
- Connection pooling to upstream
- Response caching
- Rate limiting
- Static file optimization
🚀 Deployment Steps
- Install dependencies:
make install-perf
- Update configuration:
cp gunicorn.conf.py .
cp nginx/nginx.conf nginx/
- Generate certificates:
make certs
- Start services:
docker-compose -f docker-compose.prod.yml up -d
- Run benchmarks:
./benchmark/run-benchmarks.sh
📈 Expected Results
Metric | Before | After | Improvement |
---|---|---|---|
Throughput | ~2,000 req/s | ~8,000 req/s | 4x |
P95 Latency | 50ms | 12ms | 4x faster |
Response Size | 100KB | 20KB | 80% smaller |
CPU Usage | 80% | 40% | 50% reduction |
🔧 Monitoring & Tuning
Access metrics:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
- Application metrics: http://localhost/metrics
Key metrics to watch:
http_requests_total
http_request_duration_seconds
http_requests_inprogress
python_gc_duration_seconds
This complete implementation provides enterprise-grade performance with minimal complexity, using battle-tested components (nginx + Gunicorn + FastAPI).
Metadata
Metadata
Assignees
Labels
choreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance chorescicdIssue with CI/CD process (GitHub Actions, scaffolding)Issue with CI/CD process (GitHub Actions, scaffolding)devopsDevOps activities (containers, automation, deployment, makefiles, etc)DevOps activities (containers, automation, deployment, makefiles, etc)performancePerformance related itemsPerformance related itemstriageIssues / Features awaiting triageIssues / Features awaiting triage