Skip to content

[PERFORMANCE]: Performance Optimization Implementation and Guide for MCP Gateway (baseline) #432

@crivetimihai

Description

@crivetimihai

🎯 Performance Optimization Strategy

Goal: Implement compression, HTTP/2, and FastAPI optimizations using nginx reverse proxy with optimized Gunicorn workers.

📋 Complete Implementation Plan

1. FastAPI Application Optimizations

Update mcpgateway/main.py:

# mcpgateway/main.py
from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
import orjson
from brotli_asgi import BrotliMiddleware
from starlette.middleware.gzip import GZipMiddleware
from prometheus_fastapi_instrumentator import Instrumentator
import logging

# Create app with optimized JSON response
app = FastAPI(
    title="MCP Gateway",
    default_response_class=ORJSONResponse,  # 2-4x faster JSON
    docs_url="/docs",
    redoc_url="/redoc"
)

# Configure compression middleware (order matters!)
# 1. Brotli first (better compression for modern browsers)
app.add_middleware(
    BrotliMiddleware,
    quality=4,              # 1-11, 4 is balanced
    mode="text",           # Optimize for JSON/text
    lgwin=22,              # Window size
    minimum_size=1024      # Skip small responses
)

# 2. GZip fallback for older clients
app.add_middleware(
    GZipMiddleware,
    minimum_size=1024,     # Only compress >1KB
    compresslevel=6        # Balanced speed/ratio
)

# 3. Prometheus metrics
instrumentator = Instrumentator(
    should_group_status_codes=True,
    should_ignore_untemplated=True,
    should_respect_env_var=True,
    should_instrument_requests_inprogress=True,
    excluded_handlers=[".*admin.*", "/metrics"],
    env_var_name="ENABLE_METRICS",
    inprogress_name="mcp_requests_inprogress",
    inprogress_labels=True,
)
instrumentator.instrument(app).expose(app)

# Add custom headers for debugging
@app.middleware("http")
async def add_performance_headers(request: Request, call_next):
    response = await call_next(request)
    response.headers["X-Process-Time"] = str(time.time() - start_time)
    response.headers["X-Server"] = "MCP-Gateway"
    return response

Create optimized response models:

# mcpgateway/responses.py
from typing import Any, Optional
import orjson
from fastapi.responses import Response

class ORJSONResponse(Response):
    """Fast JSON response using orjson."""
    media_type = "application/json"

    def render(self, content: Any) -> bytes:
        return orjson.dumps(
            content,
            option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY
        )

class CompactORJSONResponse(ORJSONResponse):
    """Compact JSON (no indentation) for production."""
    def render(self, content: Any) -> bytes:
        return orjson.dumps(
            content,
            option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY
        )

class PrettyORJSONResponse(ORJSONResponse):
    """Pretty JSON for development."""
    def render(self, content: Any) -> bytes:
        return orjson.dumps(
            content,
            option=orjson.OPT_NON_STR_KEYS | orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_INDENT_2
        )

2. Gunicorn Configuration

Create optimized gunicorn.conf.py:

# gunicorn.conf.py
import multiprocessing
import os

# Server socket
bind = f"0.0.0.0:{os.getenv('PORT', '4444')}"
backlog = 2048

# Worker processes
workers = os.getenv('WORKERS', multiprocessing.cpu_count() * 2 + 1)
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
max_requests = 1000
max_requests_jitter = 50
timeout = 30
keepalive = 5

# Restart workers gracefully
graceful_timeout = 30
reload = os.getenv('ENV') == 'development'

# Logging
accesslog = "-"
errorlog = "-"
loglevel = os.getenv('LOG_LEVEL', 'info')
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Process naming
proc_name = 'mcp-gateway'

# Server mechanics
daemon = False
pidfile = '/tmp/mcp-gateway.pid'
worker_tmp_dir = '/dev/shm'
user = None
group = None
tmp_upload_dir = None

# SSL (if needed)
if os.getenv('SSL') == 'true':
    keyfile = os.getenv('KEY_FILE', 'certs/key.pem')
    certfile = os.getenv('CERT_FILE', 'certs/cert.pem')

# StatsD integration (optional)
if os.getenv('STATSD_HOST'):
    statsd_host = os.getenv('STATSD_HOST')
    statsd_prefix = 'mcp.gateway'

# Preload app for memory efficiency
preload_app = True

# Enable thread support for better async
threads = 4

def pre_fork(server, worker):
    """Called just before a worker is forked."""
    server.log.info(f"Worker spawned (pid: {worker.pid})")

def post_fork(server, worker):
    """Called just after a worker is forked."""
    server.log.info(f"Worker initialized (pid: {worker.pid})")

def worker_int(worker):
    """Called when worker receives INT or QUIT signal."""
    worker.log.info(f"Worker interrupted (pid: {worker.pid})")

def pre_exec(server):
    """Called just before new master process is forked."""
    server.log.info("Forking new master process")

Update run-gunicorn.sh:

#!/bin/bash
# run-gunicorn.sh - Optimized production server

set -euo pipefail

# Load environment
source .env 2>/dev/null || true

# Set defaults
export WORKERS=${WORKERS:-$(nproc --all)}
export PORT=${PORT:-4444}
export LOG_LEVEL=${LOG_LEVEL:-info}

# Install optimized dependencies if missing
if ! python -c "import uvloop" 2>/dev/null; then
    echo "📦 Installing performance dependencies..."
    pip install -q "uvicorn[standard]" orjson brotli-asgi prometheus-fastapi-instrumentator
fi

# Use optimized Python flags
export PYTHONUNBUFFERED=1
export PYTHONOPTIMIZE=1
export PYTHONDONTWRITEBYTECODE=1

echo "🚀 Starting MCP Gateway (optimized)"
echo "   Workers: $WORKERS"
echo "   Port: $PORT"
echo "   PID: $$"

# Start with configuration file
exec gunicorn mcpgateway.main:app \
    --config gunicorn.conf.py \
    --worker-class uvicorn.workers.UvicornWorker \
    --loop uvloop \
    --http httptools

3. Nginx Reverse Proxy Configuration

Create nginx/nginx.conf:

# nginx.conf - High-performance reverse proxy

user nginx;
worker_processes auto;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'rt=$request_time uct="$upstream_connect_time" '
                    'uht="$upstream_header_time" urt="$upstream_response_time"';

    access_log /var/log/nginx/access.log main buffer=16k;

    # Performance optimizations
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 100;
    reset_timedout_connection on;
    client_body_timeout 10;
    client_header_timeout 10;
    send_timeout 10;

    # Compression (at nginx level for static content)
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript 
               application/json application/javascript application/xml+rss
               application/rss+xml application/atom+xml image/svg+xml;
    gzip_min_length 1000;
    gzip_disable "msie6";

    # Brotli (if nginx has module)
    # brotli on;
    # brotli_comp_level 6;
    # brotli_types text/plain text/css text/xml text/javascript
    #              application/json application/javascript application/xml+rss;

    # Buffer sizes
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 16k;
    output_buffers 1 32k;
    postpone_output 1460;

    # Proxy cache
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=mcp_cache:10m 
                     max_size=1g inactive=60m use_temp_path=off;

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
    limit_req_zone $binary_remote_addr zone=health_limit:1m rate=10r/s;

    # Upstream configuration
    upstream mcp_gateway {
        least_conn;
        
        # Multiple backend servers if needed
        server 127.0.0.1:4444 max_fails=3 fail_timeout=30s;
        # server 127.0.0.1:4445 max_fails=3 fail_timeout=30s backup;
        
        # Connection pooling
        keepalive 32;
        keepalive_requests 100;
        keepalive_timeout 60s;
    }

    # HTTPS server with HTTP/2
    server {
        listen 443 ssl http2;
        server_name mcp-gateway.example.com;

        # SSL configuration
        ssl_certificate /etc/nginx/certs/cert.pem;
        ssl_certificate_key /etc/nginx/certs/key.pem;
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
        ssl_prefer_server_ciphers off;
        ssl_session_cache shared:SSL:10m;
        ssl_session_timeout 10m;
        ssl_session_tickets off;
        ssl_stapling on;
        ssl_stapling_verify on;

        # Security headers
        add_header X-Frame-Options "SAMEORIGIN" always;
        add_header X-Content-Type-Options "nosniff" always;
        add_header X-XSS-Protection "1; mode=block" always;
        add_header Referrer-Policy "no-referrer-when-downgrade" always;
        add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

        # API endpoints
        location /api/ {
            # Rate limiting
            limit_req zone=api_limit burst=50 nodelay;
            
            proxy_pass http://mcp_gateway;
            proxy_http_version 1.1;
            
            # Headers
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Request-ID $request_id;
            
            # Connection reuse
            proxy_set_header Connection "";
            
            # Timeouts
            proxy_connect_timeout 5s;
            proxy_send_timeout 60s;
            proxy_read_timeout 60s;
            
            # Buffering
            proxy_buffering on;
            proxy_buffer_size 4k;
            proxy_buffers 8 4k;
            proxy_busy_buffers_size 8k;
            
            # Cache headers from upstream
            proxy_cache mcp_cache;
            proxy_cache_valid 200 1m;
            proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
            proxy_cache_background_update on;
            proxy_cache_lock on;
            
            # Don't cache POST/PUT/DELETE
            proxy_cache_methods GET HEAD;
        }

        # SSE endpoints (no buffering)
        location /api/v1/sse {
            proxy_pass http://mcp_gateway;
            proxy_http_version 1.1;
            
            # SSE specific
            proxy_set_header Connection "";
            proxy_set_header Cache-Control "no-cache";
            proxy_set_header X-Accel-Buffering "no";
            
            # Disable buffering for SSE
            proxy_buffering off;
            proxy_cache off;
            
            # Longer timeout for SSE
            proxy_read_timeout 3600s;
            
            # Standard headers
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        # Health checks (lighter rate limit)
        location /health {
            limit_req zone=health_limit burst=5;
            
            proxy_pass http://mcp_gateway;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            
            # Cache health checks
            proxy_cache mcp_cache;
            proxy_cache_valid 200 5s;
        }

        # Metrics endpoint (internal only)
        location /metrics {
            allow 10.0.0.0/8;
            allow 172.16.0.0/12;
            allow 192.168.0.0/16;
            deny all;
            
            proxy_pass http://mcp_gateway;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }

        # Static files (if any)
        location /static/ {
            alias /var/www/mcp-gateway/static/;
            expires 1y;
            add_header Cache-Control "public, immutable";
            
            # Enable gzip for static files
            gzip_static on;
        }
    }

    # HTTP to HTTPS redirect
    server {
        listen 80;
        server_name mcp-gateway.example.com;
        return 301 https://$server_name$request_uri;
    }
}

4. Docker Compose Setup

Create docker-compose.prod.yml:

version: '3.8'

services:
  nginx:
    image: nginx:alpine
    container_name: mcp-nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./certs:/etc/nginx/certs:ro
      - nginx-cache:/var/cache/nginx
    depends_on:
      - gateway
    restart: always
    networks:
      - mcp-network

  gateway:
    build:
      context: .
      dockerfile: Containerfile
    container_name: mcp-gateway
    environment:
      - WORKERS=4
      - PORT=4444
      - ENABLE_METRICS=true
    volumes:
      - ./logs:/app/logs
    restart: always
    networks:
      - mcp-network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4444/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  prometheus:
    image: prom/prometheus:latest
    container_name: mcp-prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - mcp-network

networks:
  mcp-network:
    driver: bridge

volumes:
  nginx-cache:
  prometheus-data:

5. Benchmarking Suite

Create benchmark/run-benchmarks.sh:

#!/bin/bash
# Comprehensive performance benchmark suite

set -euo pipefail

# Configuration
BASE_URL="${BASE_URL:-http://localhost}"
WARMUP_REQUESTS=1000
BENCHMARK_DURATION=30
CONNECTIONS=100
THREADS=4

# Colors
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m'

echo -e "${GREEN}🚀 MCP Gateway Performance Benchmark${NC}"
echo "=================================="
date

# Function to check if server is ready
wait_for_server() {
    echo -n "Waiting for server to be ready..."
    for i in {1..30}; do
        if curl -s "$BASE_URL/health" > /dev/null; then
            echo -e " ${GREEN}Ready!${NC}"
            return 0
        fi
        echo -n "."
        sleep 1
    done
    echo -e " ${RED}Timeout!${NC}"
    return 1
}

# Function to run wrk benchmark
run_wrk_test() {
    local endpoint=$1
    local name=$2
    
    echo -e "\n${YELLOW}📊 Testing: $name${NC}"
    echo "Endpoint: $endpoint"
    
    wrk -t$THREADS -c$CONNECTIONS -d${BENCHMARK_DURATION}s \
        --latency \
        -H "Accept-Encoding: gzip, br" \
        "$endpoint"
}

# Function to test compression
test_compression() {
    echo -e "\n${YELLOW}📦 Compression Test${NC}"
    
    # Test uncompressed
    size_raw=$(curl -s -o /dev/null -w "%{size_download}" "$BASE_URL/api/v1/config")
    echo "Uncompressed size: $size_raw bytes"
    
    # Test gzip
    size_gzip=$(curl -s -H "Accept-Encoding: gzip" -o /dev/null -w "%{size_download}" "$BASE_URL/api/v1/config")
    echo "Gzip compressed: $size_gzip bytes"
    
    # Test brotli
    size_br=$(curl -s -H "Accept-Encoding: br" -o /dev/null -w "%{size_download}" "$BASE_URL/api/v1/config")
    echo "Brotli compressed: $size_br bytes"
    
    # Calculate savings
    if [ $size_raw -gt 0 ]; then
        gzip_saving=$(( 100 - (size_gzip * 100 / size_raw) ))
        br_saving=$(( 100 - (size_br * 100 / size_raw) ))
        echo -e "${GREEN}Gzip savings: ${gzip_saving}%${NC}"
        echo -e "${GREEN}Brotli savings: ${br_saving}%${NC}"
    fi
}

# Function to test concurrent connections
test_concurrency() {
    echo -e "\n${YELLOW}🔄 Concurrency Test${NC}"
    
    for conns in 10 50 100 200 500; do
        echo -n "Testing with $conns concurrent connections: "
        result=$(wrk -t4 -c$conns -d10s --timeout 10s "$BASE_URL/health" 2>&1 | grep "Requests/sec" | awk '{print $2}')
        echo "$result req/s"
    done
}

# Main benchmark flow
main() {
    # Check prerequisites
    command -v wrk >/dev/null 2>&1 || { echo "Error: wrk not installed"; exit 1; }
    command -v curl >/dev/null 2>&1 || { echo "Error: curl not installed"; exit 1; }
    
    # Wait for server
    wait_for_server || exit 1
    
    # Warm up
    echo -e "\n${YELLOW}♨️  Warming up server...${NC}"
    wrk -t2 -c10 -d5s "$BASE_URL/health" > /dev/null 2>&1
    
    # Run benchmarks
    run_wrk_test "$BASE_URL/health" "Health Check Endpoint"
    run_wrk_test "$BASE_URL/version" "Version Endpoint"
    run_wrk_test "$BASE_URL/api/v1/config" "Config API Endpoint"
    
    # Test compression
    test_compression
    
    # Test concurrency scaling
    test_concurrency
    
    # Memory test
    echo -e "\n${YELLOW}💾 Memory Usage${NC}"
    if command -v docker >/dev/null 2>&1; then
        docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep -E "(NAME|mcp-)"
    fi
    
    echo -e "\n${GREEN}✅ Benchmark complete!${NC}"
}

# Run with error handling
main "$@"

6. Monitoring Configuration

Create prometheus.yml:

# Prometheus configuration
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'mcp-gateway'
    static_configs:
      - targets: ['gateway:4444']
    metrics_path: '/metrics'
    
  - job_name: 'nginx'
    static_configs:
      - targets: ['nginx:9113']

📊 Performance Tuning Checklist

Application Level:

  • ORJSON for fast JSON serialization
  • Brotli + GZip compression middleware
  • Prometheus metrics integration
  • Connection pooling for DB/Redis
  • Async all the way down

Server Level:

  • Uvloop event loop
  • HTTPTools parser
  • Optimized worker count
  • Shared memory for temp files
  • Preloaded application

Proxy Level:

  • HTTP/2 support
  • Connection pooling to upstream
  • Response caching
  • Rate limiting
  • Static file optimization

🚀 Deployment Steps

  1. Install dependencies:
make install-perf
  1. Update configuration:
cp gunicorn.conf.py .
cp nginx/nginx.conf nginx/
  1. Generate certificates:
make certs
  1. Start services:
docker-compose -f docker-compose.prod.yml up -d
  1. Run benchmarks:
./benchmark/run-benchmarks.sh

📈 Expected Results

Metric Before After Improvement
Throughput ~2,000 req/s ~8,000 req/s 4x
P95 Latency 50ms 12ms 4x faster
Response Size 100KB 20KB 80% smaller
CPU Usage 80% 40% 50% reduction

🔧 Monitoring & Tuning

Access metrics:

Key metrics to watch:

  • http_requests_total
  • http_request_duration_seconds
  • http_requests_inprogress
  • python_gc_duration_seconds

This complete implementation provides enterprise-grade performance with minimal complexity, using battle-tested components (nginx + Gunicorn + FastAPI).

Metadata

Metadata

Assignees

Labels

choreLinting, formatting, dependency hygiene, or project maintenance chorescicdIssue with CI/CD process (GitHub Actions, scaffolding)devopsDevOps activities (containers, automation, deployment, makefiles, etc)performancePerformance related itemstriageIssues / Features awaiting triage

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions