Skip to content

alex-crouch/tensorzero-sglang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorZero SGLang Setup

This repository contains a Docker Compose setup for running TensorZero locally with SGLang and the Qwen3-32B-AWQ model.

Overview

TensorZero is a platform for building reliable LLM applications. This setup includes:

  • SGLang: Serving the Qwen3-32B-AWQ model locally (using a custom Docker image with vLLM 0.8.4)
  • ClickHouse: Database for storing inference data and analytics
  • TensorZero Gateway: HTTP API for interacting with models
  • TensorZero UI: Web interface for monitoring and analytics

Custom SGLang Image

This setup uses a custom SGLang Docker image that includes vLLM 0.8.4 for compatibility. The custom image is built from the Dockerfile in this repository. The image is automatically built when you run docker-compose up for the first time. Subsequent runs will use the cached image unless you rebuild it explicitly.

Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with sufficient VRAM (recommended: ~24GB for Qwen3-32B-AWQ)

Environment Setup

  1. Copy the environment template:
cp .env.example .env
  1. Edit .env and set your API keys:
# Required: OpenAI API key for TensorZero services
OPENAI_API_KEY=your_openai_api_key_here

# Required: HuggingFace token for model access
HF_TOKEN=your_huggingface_token_here

Quick Start

  1. Clone this repository:
git clone <your-repo-url>
cd tensorzero-sglang
  1. Set up environment variables (see Environment Setup above)

  2. Build and start the services:

docker-compose up -d

Note: The first run will take some time as it builds the custom SGLang image with vLLM 0.8.4.

  1. Wait for all services to be healthy:
docker-compose ps
  1. Access the TensorZero UI at http://localhost:4000

Services

SGLang (Model Server)

ClickHouse (Database)

  • Port: 8123
  • Username: chuser
  • Password: chpassword
  • Database: tensorzero

TensorZero Gateway

TensorZero UI

Configuration

The TensorZero configuration is defined in config/tensorzero.toml:

  • Model: qwen3_local - Points to the SGLang server
  • Function: qwen - Chat completion function using the local model

Usage

Using the Python Client

from tensorzero import TensorZeroGateway

client = TensorZeroGateway("http://localhost:3020")

response = client.inference(
    function_name="qwen",
    input={
        "messages": [
            {"role": "user", "content": "Hello, how are you?"}
        ]
    }
)

print(response.content)

Using OpenAI Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3020/v1",
    api_key="dummy"  # Not required for local setup
)

response = client.chat.completions.create(
    model="qwen",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Using cURL

curl -X POST http://localhost:3020/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "qwen",
    "input": {
      "messages": [
        {"role": "user", "content": "Hello, how are you?"}
      ]
    }
  }'

Monitoring

Troubleshooting

Common Issues

  1. GPU Memory Issues

    • Ensure you have sufficient VRAM (24GB+ recommended)
    • Consider using a smaller model if memory is limited
  2. Service Startup Failures

    • Check Docker logs: docker-compose logs [service-name]
    • Ensure all environment variables are set correctly
  3. Model Download Issues

    • Verify your HuggingFace token has access to the model
    • Check internet connectivity for model downloads

Useful Commands

# View logs for all services
docker-compose logs -f

# View logs for specific service
docker-compose logs -f sglang

# Rebuild the custom SGLang image (if Dockerfile changes)
docker-compose build sglang

# Rebuild and restart all services
docker-compose up -d --build

# Force rebuild without cache
docker-compose build --no-cache sglang

# Restart services
docker-compose restart

# Stop all services
docker-compose down

# Remove all containers and volumes
docker-compose down -v

Development

Adding New Models

  1. Add model configuration to config/tensorzero.toml
  2. Update the SGLang service in docker-compose.yml
  3. Restart the services

Custom Configuration

Modify config/tensorzero.toml to add:

  • New models and providers
  • Custom functions and variants
  • Routing configurations

Security Notes

Resources

License

This project is provided as-is for educational and development purposes.

About

Complete docker setup and tensorzero config for running Qwen3-32B-AWQ locally.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published