TensorZero SGLang Setup

This repository contains a Docker Compose setup for running TensorZero locally with SGLang and the Qwen3-32B-AWQ model.

Overview

TensorZero is a platform for building reliable LLM applications. This setup includes:

SGLang: Serving the Qwen3-32B-AWQ model locally (using a custom Docker image with vLLM 0.8.4)
ClickHouse: Database for storing inference data and analytics
TensorZero Gateway: HTTP API for interacting with models
TensorZero UI: Web interface for monitoring and analytics

Custom SGLang Image

This setup uses a custom SGLang Docker image that includes vLLM 0.8.4 for compatibility. The custom image is built from the Dockerfile in this repository. The image is automatically built when you run docker-compose up for the first time. Subsequent runs will use the cached image unless you rebuild it explicitly.

Prerequisites

Docker and Docker Compose
NVIDIA GPU with sufficient VRAM (recommended: ~24GB for Qwen3-32B-AWQ)

Environment Setup

Copy the environment template:

cp .env.example .env

Edit .env and set your API keys:

# Required: OpenAI API key for TensorZero services
OPENAI_API_KEY=your_openai_api_key_here

# Required: HuggingFace token for model access
HF_TOKEN=your_huggingface_token_here

Quick Start

Clone this repository:

git clone <your-repo-url>
cd tensorzero-sglang

Set up environment variables (see Environment Setup above)
Build and start the services:

docker-compose up -d

Note: The first run will take some time as it builds the custom SGLang image with vLLM 0.8.4.

Wait for all services to be healthy:

docker-compose ps

Access the TensorZero UI at http://localhost:4000

Services

SGLang (Model Server)

Port: 30000
Model: Qwen3-32B-AWQ
Health Check: http://localhost:30000/health

ClickHouse (Database)

Port: 8123
Username: chuser
Password: chpassword
Database: tensorzero

TensorZero Gateway

Port: 3020
API Endpoint: http://localhost:3020

TensorZero UI

Port: 4000
URL: http://localhost:4000

Configuration

The TensorZero configuration is defined in config/tensorzero.toml:

Model: qwen3_local - Points to the SGLang server
Function: qwen - Chat completion function using the local model

Usage

Using the Python Client

from tensorzero import TensorZeroGateway

client = TensorZeroGateway("http://localhost:3020")

response = client.inference(
    function_name="qwen",
    input={
        "messages": [
            {"role": "user", "content": "Hello, how are you?"}
        ]
    }
)

print(response.content)

Using OpenAI Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3020/v1",
    api_key="dummy"  # Not required for local setup
)

response = client.chat.completions.create(
    model="qwen",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Using cURL

curl -X POST http://localhost:3020/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "qwen",
    "input": {
      "messages": [
        {"role": "user", "content": "Hello, how are you?"}
      ]
    }
  }'

Monitoring

TensorZero UI: http://localhost:4000 - View inference analytics, model performance, and system metrics
SGLang Health: http://localhost:30000/health - Check model server status
ClickHouse: http://localhost:8123 - Direct database access

Troubleshooting

Common Issues

GPU Memory Issues
- Ensure you have sufficient VRAM (24GB+ recommended)
- Consider using a smaller model if memory is limited
Service Startup Failures
- Check Docker logs: docker-compose logs [service-name]
- Ensure all environment variables are set correctly
Model Download Issues
- Verify your HuggingFace token has access to the model
- Check internet connectivity for model downloads

Useful Commands

# View logs for all services
docker-compose logs -f

# View logs for specific service
docker-compose logs -f sglang

# Rebuild the custom SGLang image (if Dockerfile changes)
docker-compose build sglang

# Rebuild and restart all services
docker-compose up -d --build

# Force rebuild without cache
docker-compose build --no-cache sglang

# Restart services
docker-compose restart

# Stop all services
docker-compose down

# Remove all containers and volumes
docker-compose down -v

Development

Adding New Models

Add model configuration to config/tensorzero.toml
Update the SGLang service in docker-compose.yml
Restart the services

Custom Configuration

Modify config/tensorzero.toml to add:

New models and providers
Custom functions and variants
Routing configurations

Security Notes

For production use, refer to TensorZero deployment docs
Use proper authentication in production environments

Resources

License

This project is provided as-is for educational and development purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TensorZero SGLang Setup

Overview

Custom SGLang Image

Prerequisites

Environment Setup

Quick Start

Services

SGLang (Model Server)

ClickHouse (Database)

TensorZero Gateway

TensorZero UI

Configuration

Usage

Using the Python Client

Using OpenAI Client

Using cURL

Monitoring

Troubleshooting

Common Issues

Useful Commands

Development

Adding New Models

Custom Configuration

Security Notes

Resources

License

About

Uh oh!

Releases

Packages

Languages

alex-crouch/tensorzero-sglang

Folders and files

Latest commit

History

Repository files navigation

TensorZero SGLang Setup

Overview

Custom SGLang Image

Prerequisites

Environment Setup

Quick Start

Services

SGLang (Model Server)

ClickHouse (Database)

TensorZero Gateway

TensorZero UI

Configuration

Usage

Using the Python Client

Using OpenAI Client

Using cURL

Monitoring

Troubleshooting

Common Issues

Useful Commands

Development

Adding New Models

Custom Configuration

Security Notes

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages