This document describes how to run Sugar-AI, test recent changes, and troubleshoot common issues.
Sugar-AI provides a Docker-based deployment option for an isolated and reproducible environment.
Open your terminal in the project's root directory and run:
docker build -t sugar-ai .
-
With GPU (using NVIDIA Docker runtime):
docker run --gpus all -it --rm sugar-ai
-
CPU-only:
docker run -it --rm sugar-ai
The container starts by executing main.py
. To change the startup behavior, update the Dockerfile accordingly.
The FastAPI server provides endpoints to interact with Sugar-AI.
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000
Sugar-AI provides three different endpoints for different use cases:
Endpoint | Purpose | Input Format | Features |
---|---|---|---|
/ask |
RAG-enabled answers | Query parameter | • Retrieval-Augmented Generation • Sugar/Pygame/GTK documentation • Child-friendly responses |
/ask-llm |
Direct LLM without RAG | Query parameter | • No document retrieval • Direct model access • Faster responses • Default system prompt and parameters |
/ask-llm-prompted |
Custom prompt with advanced controls | JSON body | • Custom system prompts • Configurable model parameters |
-
GET endpoint
Access the root URL:
http://localhost:8000/ to see the welcome message. -
POST endpoint for asking questions
To submit a coding question, send a POST request to
/ask
with thequestion
parameter. For example:curl -X POST "http://localhost:8000/ask?question=How%20do%20I%20create%20a%20Pygame%20window?"
The API returns a JSON object with the answer.
-
Additional POST endpoint (/ask-llm)
An alternative endpoint
/ask-llm
is available inmain.py
, which provides similar functionality with an enhanced processing pipeline for LLM interactions. To use it, send your coding-related question using:curl -X POST "http://localhost:8000/ask-llm?question=How%20do%20I%20create%20a%20Pygame%20window?"
The response format is JSON containing the answer generated by the language model.
-
Advanced POST endpoint - Custom prompt + generation parameters (/ask-llm-prompted)
A powerful endpoint that allows you to use custom prompts and fine-tune generation parameters. Unlike the other endpoints, this one:
- Uses your own custom system prompt
- Accepts JSON request body with configurable model parameters
- Provides direct LLM access without RAG
Basic Usage:
curl -X POST "http://localhost:8000/ask-llm-prompted" \ -H "X-API-Key: sugarai2024" \ -H "Content-Type: application/json" \ -d '{ "question": "How do I create a Pygame window?", "custom_prompt": "You are a Python expert. Provide detailed code examples with explanations." }'
Advanced Usage with Generation Parameters:
curl -X POST "http://localhost:8000/ask-llm-prompted" \ -H "X-API-Key: sugarai2024" \ -H "Content-Type: application/json" \ -d '{ "question": "Write a function to calculate fibonacci numbers", "custom_prompt": "You are a coding tutor. Explain step-by-step with comments.", "max_length": 1024, "truncation": true, "repetition_penalty": 1.1, "temperature": 0.7, "top_p": 0.9, "top_k": 50 }'
Request Parameters:
question
(required): The question or task to processcustom_prompt
(required): Your custom system promptmax_length
(optional, default: 1024): Maximum length of generated responsetruncation
(optional, default: true): Whether to truncate long inputsrepetition_penalty
(optional, default: 1.1): Controls repetition (1.0 = no penalty, >1.0 = less repetition)temperature
(optional, default: 0.7): Controls randomness (0.0 = deterministic, 1.0 = very random)top_p
(optional, default: 0.9): Nucleus sampling (0.1 = focused, 0.9 = diverse)top_k
(optional, default: 50): Limits vocabulary to K most likely words
Response Format:
{ "answer": "Here's how to create a Pygame window:\n\nimport pygame...", "user": "Admin Key", "quota": {"remaining": 95, "total": 100}, "generation_params": { "max_length": 1024, "truncation": true, "repetition_penalty": 1.1, "temperature": 0.7, "top_p": 0.9, "top_k": 50 } }
Use Cases: Different activites can now use different system prompts and different generation parameters to achieve a model that is personalized to that activites needs.
Generation Parameter Guidelines:
- For Code:
temperature: 0.2-0.4, top_p: 0.8, repetition_penalty: 1.1
- For Creative Content:
temperature: 0.7-0.9, top_p: 0.9, repetition_penalty: 1.2
- For Factual Answers:
temperature: 0.3-0.5, top_p: 0.7, repetition_penalty: 1.0
Sugar-AI implements an API key-based authentication system for secure access to endpoints.
API keys are defined in the .env
file with the following format:
API_KEYS={"sugarai2024": {"name": "Admin Key", "can_change_model": true}, "user_key_1": {"name": "User 1", "can_change_model": false}}
Each key has associated user information:
name
: A friendly name for the user (appears in API responses and logs)can_change_model
: Boolean that controls permission to change the model
To use the authenticated endpoints, include the API key in your request headers:
curl -X POST "http://localhost:8000/ask?question=How%20do%20I%20create%20a%20Pygame%20window?" \
-H "X-API-Key: sugarai2024"
The response will include the user name:
{
"answer": "To create a Pygame window...",
"user": "Admin Key"
}
Users with can_change_model: true
permission can change the model:
curl -X POST "http://localhost:8000/change-model?model=Qwen/Qwen2-1.5B-Instruct&api_key=sugarai2024&password=sugarai2024"
The user name serves several purposes:
- It provides identification in API responses, helping track which user made which request
- It adds context to server logs for monitoring API usage
- It allows for more personalized interaction in multi-user environments
- It helps administrators identify which API key corresponds to which user
Sugar-AI includes several additional security features to protect the API and manage resources effectively:
Each API key has a daily request limit defined in the .env
file:
MAX_DAILY_REQUESTS=100
The system automatically tracks usage and resets quotas daily. When testing:
-
Check remaining quota by examining API responses:
{ "answer": "Your answer here...", "user": "User 1", "quota": {"remaining": 95, "total": 100} }
-
Test quota enforcement by sending more than the allowed number of requests. The API will return a 429 status code when the quota is exceeded:
curl -i -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: user_key_1" # After exceeding quota: # HTTP/1.1 429 Too Many Requests # {"detail":"Daily request quota exceeded"}
Sugar-AI implements comprehensive logging for security monitoring:
- All API requests are logged with user information, IP addresses, and timestamps
- Failed authentication attempts are recorded with warning level
- Model change attempts are tracked with detailed information
- All logs are stored in
sugar_ai.log
for review
To test logging functionality:
# Make a valid request
curl -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: sugarai2024"
# Make an invalid request
curl -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: invalid_key"
# Check the logs
tail -f sugar_ai.log
The API implements CORS (Cross-Origin Resource Sharing) and trusted host verification:
- In development mode, API access is allowed from all origins
- For production, consider restricting the
allow_origins
parameter inmain.py
The Streamlit app should be updated to include API key authentication and support for all three endpoints:
# Updated streamlit.py example
import streamlit as st
import requests
import json
st.title("Sugar-AI Chat Interface")
# Add API key field
api_key = st.sidebar.text_input("API Key", type="password")
# Endpoint selection
endpoint_choice = st.selectbox(
"Choose endpoint:",
["RAG (ask)", "Direct LLM (ask-llm)", "Custom Prompt (ask-llm-prompted)"]
)
st.subheader("Ask Sugar-AI")
question = st.text_input("Enter your question:")
# Custom prompt section for ask-llm-prompted
custom_prompt = ""
generation_params = {}
if endpoint_choice == "Custom Prompt (ask-llm-prompted)":
custom_prompt = st.text_area(
"Custom Prompt:",
value="You are a helpful assistant. Provide clear and detailed answers.",
help="This prompt will replace the default system prompt"
)
# Generation parameters
with st.expander("Advanced Generation Parameters"):
col1, col2 = st.columns(2)
with col1:
max_length = st.number_input("Max Length", value=1024, min_value=100, max_value=2048)
temperature = st.slider("Temperature", 0.0, 1.0, 0.7, 0.1)
repetition_penalty = st.slider("Repetition Penalty", 0.5, 2.0, 1.1, 0.1)
with col2:
top_p = st.slider("Top P", 0.1, 1.0, 0.9, 0.1)
top_k = st.number_input("Top K", value=50, min_value=1, max_value=100)
truncation = st.checkbox("Truncation", value=True)
generation_params = {
"max_length": max_length,
"truncation": truncation,
"repetition_penalty": repetition_penalty,
"temperature": temperature,
"top_p": top_p,
"top_k": top_k
}
if st.button("Submit"):
if question and api_key:
headers = {"X-API-Key": api_key}
try:
if endpoint_choice == "RAG (ask)":
url = "http://localhost:8000/ask"
params = {"question": question}
response = requests.post(url, params=params, headers=headers)
elif endpoint_choice == "Direct LLM (ask-llm)":
url = "http://localhost:8000/ask-llm"
params = {"question": question}
response = requests.post(url, params=params, headers=headers)
elif endpoint_choice == "Custom Prompt (ask-llm-prompted)":
url = "http://localhost:8000/ask-llm-prompted"
headers["Content-Type"] = "application/json"
data = {
"question": question,
"custom_prompt": custom_prompt,
**generation_params
}
response = requests.post(url, headers=headers, data=json.dumps(data))
if response.status_code == 200:
result = response.json()
st.markdown("**Answer:** " + result["answer"])
st.sidebar.info(f"User: {result.get('user', 'Unknown')}")
st.sidebar.info(f"Remaining quota: {result['quota']['remaining']}/{result['quota']['total']}")
# Show generation parameters for custom prompt endpoint
if endpoint_choice == "Custom Prompt (ask-llm-prompted)" and "generation_params" in result:
with st.expander("Generation Parameters Used"):
st.json(result["generation_params"])
else:
st.error(f"Error {response.status_code}: {response.text}")
except Exception as e:
st.error(f"Error contacting the API: {e}")
elif not question:
st.warning("Please enter a question.")
elif not api_key:
st.warning("Please enter an API key.")
Run this updated Streamlit app to test the complete authentication flow and quota visibility.
To test the new RAG Agent directly from the CLI, execute:
python rag_agent.py --quantize
Remove the --quantize
flag if you prefer running without 4‑bit quantization.
-
Verify Model Setup:
- Confirm the selected model loads correctly by checking the terminal output for any errors.
-
Document Retrieval:
- Place your documents (PDF or text files) in the directory specified in the default parameters or provide your paths using the
--docs
flag. - The vector store is rebuilt every time the agent starts. Ensure your documents are well placed to retrieve relevant content.
- Place your documents (PDF or text files) in the directory specified in the default parameters or provide your paths using the
-
Question Handling:
- After the agent starts, enter a sample coding-related question.
- The assistant should respond by incorporating context from the loaded documents and answering your query.
-
API and Docker Route:
- Optionally, combine these changes by deploying the updated version via Docker and testing the FastAPI endpoints as described above.
If you encounter CUDA out-of-memory errors, consider running the agent on CPU or adjust CUDA settings:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Review the terminal output for further details and error messages.
Sugar-AI also provides a Streamlit-based interface for quick interactions and visualizations.
-
Install Streamlit:
If you haven't already, install Streamlit:
pip install streamlit
-
Make sure server is running using:
uvicorn main:app --host 0.0.0.0 --port 8000
-
Start the App:
Launch the Streamlit app by adding streamlit.py file.
#./streamlit.py import streamlit as st import requests st.title("Sugar-AI Chat Interface") use_rag = st.checkbox("Use RAG (Retrieval-Augmented Generation)", value=True) st.subheader("Ask Sugar-AI") question = st.text_input("Enter your question:") if st.button("Submit"): if question: if use_rag: url = "http://localhost:8000/ask" else: url = "http://localhost:8000/ask-llm" params = {"question": question} try: response = requests.post(url, params=params) if response.status_code == 200: result = response.json() st.markdown("**Answer:** " + result["answer"]) else: st.error(f"Error {response.status_code}: {response.text}") except Exception as e: st.error(f"Error contacting the API: {e}") else: st.warning("Please enter a question.")
streamlit run streamlit.py
-
Using the App:
- The app provides a simple UI to input coding questions and displays the response using Sugar-AI.
- Use the sidebar options to configure settings if available.
- The app communicates with the FastAPI backend to process and retrieve answers.
Enjoy exploring Sugar-AI through both API endpoints and the interactive Streamlit interface!