Skip to content

sugarlabs/sugar-ai

Repository files navigation

Sugar-AI Project

This document describes how to run Sugar-AI, test recent changes, and troubleshoot common issues.

Running Sugar-AI with Docker

Sugar-AI provides a Docker-based deployment option for an isolated and reproducible environment.

Build the Docker image

Open your terminal in the project's root directory and run:

docker build -t sugar-ai .

Run the Docker container

  • With GPU (using NVIDIA Docker runtime):

    docker run --gpus all -it --rm sugar-ai
  • CPU-only:

    docker run -it --rm sugar-ai

The container starts by executing main.py. To change the startup behavior, update the Dockerfile accordingly.

Testing the FastAPI App

The FastAPI server provides endpoints to interact with Sugar-AI.

Install dependencies

pip install -r requirements.txt

Run the server

uvicorn main:app --host 0.0.0.0 --port 8000

Test API endpoints

Sugar-AI provides three different endpoints for different use cases:

Endpoint Purpose Input Format Features
/ask RAG-enabled answers Query parameter • Retrieval-Augmented Generation
• Sugar/Pygame/GTK documentation
• Child-friendly responses
/ask-llm Direct LLM without RAG Query parameter • No document retrieval
• Direct model access
• Faster responses
• Default system prompt and parameters
/ask-llm-prompted Custom prompt with advanced controls JSON body • Custom system prompts
• Configurable model parameters
  • GET endpoint

    Access the root URL:
    http://localhost:8000/ to see the welcome message.

  • POST endpoint for asking questions

    To submit a coding question, send a POST request to /ask with the question parameter. For example:

    curl -X POST "http://localhost:8000/ask?question=How%20do%20I%20create%20a%20Pygame%20window?"

    The API returns a JSON object with the answer.

  • Additional POST endpoint (/ask-llm)

    An alternative endpoint /ask-llm is available in main.py, which provides similar functionality with an enhanced processing pipeline for LLM interactions. To use it, send your coding-related question using:

    curl -X POST "http://localhost:8000/ask-llm?question=How%20do%20I%20create%20a%20Pygame%20window?"

    The response format is JSON containing the answer generated by the language model.

  • Advanced POST endpoint - Custom prompt + generation parameters (/ask-llm-prompted)

    A powerful endpoint that allows you to use custom prompts and fine-tune generation parameters. Unlike the other endpoints, this one:

    • Uses your own custom system prompt
    • Accepts JSON request body with configurable model parameters
    • Provides direct LLM access without RAG

    Basic Usage:

    curl -X POST "http://localhost:8000/ask-llm-prompted" \
      -H "X-API-Key: sugarai2024" \
      -H "Content-Type: application/json" \
      -d '{
        "question": "How do I create a Pygame window?",
        "custom_prompt": "You are a Python expert. Provide detailed code examples with explanations."
      }'

    Advanced Usage with Generation Parameters:

    curl -X POST "http://localhost:8000/ask-llm-prompted" \
      -H "X-API-Key: sugarai2024" \
      -H "Content-Type: application/json" \
      -d '{
        "question": "Write a function to calculate fibonacci numbers",
        "custom_prompt": "You are a coding tutor. Explain step-by-step with comments.",
        "max_length": 1024,
        "truncation": true,
        "repetition_penalty": 1.1,
        "temperature": 0.7,
        "top_p": 0.9,
        "top_k": 50
      }'

    Request Parameters:

    • question (required): The question or task to process
    • custom_prompt (required): Your custom system prompt
    • max_length (optional, default: 1024): Maximum length of generated response
    • truncation (optional, default: true): Whether to truncate long inputs
    • repetition_penalty (optional, default: 1.1): Controls repetition (1.0 = no penalty, >1.0 = less repetition)
    • temperature (optional, default: 0.7): Controls randomness (0.0 = deterministic, 1.0 = very random)
    • top_p (optional, default: 0.9): Nucleus sampling (0.1 = focused, 0.9 = diverse)
    • top_k (optional, default: 50): Limits vocabulary to K most likely words

    Response Format:

    {
      "answer": "Here's how to create a Pygame window:\n\nimport pygame...",
      "user": "Admin Key",
      "quota": {"remaining": 95, "total": 100},
      "generation_params": {
        "max_length": 1024,
        "truncation": true,
        "repetition_penalty": 1.1,
        "temperature": 0.7,
        "top_p": 0.9,
        "top_k": 50
      }
    }

    Use Cases: Different activites can now use different system prompts and different generation parameters to achieve a model that is personalized to that activites needs.

    Generation Parameter Guidelines:

    • For Code: temperature: 0.2-0.4, top_p: 0.8, repetition_penalty: 1.1
    • For Creative Content: temperature: 0.7-0.9, top_p: 0.9, repetition_penalty: 1.2
    • For Factual Answers: temperature: 0.3-0.5, top_p: 0.7, repetition_penalty: 1.0

API Authentication

Sugar-AI implements an API key-based authentication system for secure access to endpoints.

Setting Up Authentication

API keys are defined in the .env file with the following format:

API_KEYS={"sugarai2024": {"name": "Admin Key", "can_change_model": true}, "user_key_1": {"name": "User 1", "can_change_model": false}}

Each key has associated user information:

  • name: A friendly name for the user (appears in API responses and logs)
  • can_change_model: Boolean that controls permission to change the model

Testing Authentication

To use the authenticated endpoints, include the API key in your request headers:

curl -X POST "http://localhost:8000/ask?question=How%20do%20I%20create%20a%20Pygame%20window?" \
  -H "X-API-Key: sugarai2024"

The response will include the user name:

{
  "answer": "To create a Pygame window...",
  "user": "Admin Key"
}

Changing Models (Admin Only)

Users with can_change_model: true permission can change the model:

curl -X POST "http://localhost:8000/change-model?model=Qwen/Qwen2-1.5B-Instruct&api_key=sugarai2024&password=sugarai2024"

Why User Names Are Useful

The user name serves several purposes:

  1. It provides identification in API responses, helping track which user made which request
  2. It adds context to server logs for monitoring API usage
  3. It allows for more personalized interaction in multi-user environments
  4. It helps administrators identify which API key corresponds to which user

Advanced Security Features

Sugar-AI includes several additional security features to protect the API and manage resources effectively:

Request Quotas

Each API key has a daily request limit defined in the .env file:

MAX_DAILY_REQUESTS=100

The system automatically tracks usage and resets quotas daily. When testing:

  1. Check remaining quota by examining API responses:

    {
      "answer": "Your answer here...",
      "user": "User 1",
      "quota": {"remaining": 95, "total": 100}
    }
  2. Test quota enforcement by sending more than the allowed number of requests. The API will return a 429 status code when the quota is exceeded:

    curl -i -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: user_key_1"
    # After exceeding quota:
    # HTTP/1.1 429 Too Many Requests
    # {"detail":"Daily request quota exceeded"}

Security Logging

Sugar-AI implements comprehensive logging for security monitoring:

  1. All API requests are logged with user information, IP addresses, and timestamps
  2. Failed authentication attempts are recorded with warning level
  3. Model change attempts are tracked with detailed information
  4. All logs are stored in sugar_ai.log for review

To test logging functionality:

# Make a valid request
curl -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: sugarai2024"

# Make an invalid request
curl -X POST "http://localhost:8000/ask?question=Test" -H "X-API-Key: invalid_key"

# Check the logs
tail -f sugar_ai.log

CORS and Trusted Hosts

The API implements CORS (Cross-Origin Resource Sharing) and trusted host verification:

  • In development mode, API access is allowed from all origins
  • For production, consider restricting the allow_origins parameter in main.py

Testing with Streamlit App

The Streamlit app should be updated to include API key authentication and support for all three endpoints:

# Updated streamlit.py example
import streamlit as st
import requests
import json

st.title("Sugar-AI Chat Interface")

# Add API key field
api_key = st.sidebar.text_input("API Key", type="password")

# Endpoint selection
endpoint_choice = st.selectbox(
    "Choose endpoint:",
    ["RAG (ask)", "Direct LLM (ask-llm)", "Custom Prompt (ask-llm-prompted)"]
)

st.subheader("Ask Sugar-AI")
question = st.text_input("Enter your question:")

# Custom prompt section for ask-llm-prompted
custom_prompt = ""
generation_params = {}

if endpoint_choice == "Custom Prompt (ask-llm-prompted)":
    custom_prompt = st.text_area(
        "Custom Prompt:", 
        value="You are a helpful assistant. Provide clear and detailed answers.",
        help="This prompt will replace the default system prompt"
    )
    
    # Generation parameters
    with st.expander("Advanced Generation Parameters"):
        col1, col2 = st.columns(2)
        
        with col1:
            max_length = st.number_input("Max Length", value=1024, min_value=100, max_value=2048)
            temperature = st.slider("Temperature", 0.0, 1.0, 0.7, 0.1)
            repetition_penalty = st.slider("Repetition Penalty", 0.5, 2.0, 1.1, 0.1)
        
        with col2:
            top_p = st.slider("Top P", 0.1, 1.0, 0.9, 0.1)
            top_k = st.number_input("Top K", value=50, min_value=1, max_value=100)
            truncation = st.checkbox("Truncation", value=True)
    
    generation_params = {
        "max_length": max_length,
        "truncation": truncation,
        "repetition_penalty": repetition_penalty,
        "temperature": temperature,
        "top_p": top_p,
        "top_k": top_k
    }

if st.button("Submit"):
    if question and api_key:
        headers = {"X-API-Key": api_key}
        
        try:
            if endpoint_choice == "RAG (ask)":
                url = "http://localhost:8000/ask"
                params = {"question": question}
                response = requests.post(url, params=params, headers=headers)
                
            elif endpoint_choice == "Direct LLM (ask-llm)":
                url = "http://localhost:8000/ask-llm"
                params = {"question": question}
                response = requests.post(url, params=params, headers=headers)
                
            elif endpoint_choice == "Custom Prompt (ask-llm-prompted)":
                url = "http://localhost:8000/ask-llm-prompted"
                headers["Content-Type"] = "application/json"
                data = {
                    "question": question,
                    "custom_prompt": custom_prompt,
                    **generation_params
                }
                response = requests.post(url, headers=headers, data=json.dumps(data))
            
            if response.status_code == 200:
                result = response.json()
                st.markdown("**Answer:** " + result["answer"])
                st.sidebar.info(f"User: {result.get('user', 'Unknown')}")
                st.sidebar.info(f"Remaining quota: {result['quota']['remaining']}/{result['quota']['total']}")
                
                # Show generation parameters for custom prompt endpoint
                if endpoint_choice == "Custom Prompt (ask-llm-prompted)" and "generation_params" in result:
                    with st.expander("Generation Parameters Used"):
                        st.json(result["generation_params"])
                        
            else:
                st.error(f"Error {response.status_code}: {response.text}")
                
        except Exception as e:
            st.error(f"Error contacting the API: {e}")
            
    elif not question:
        st.warning("Please enter a question.")
    elif not api_key:
        st.warning("Please enter an API key.")

Run this updated Streamlit app to test the complete authentication flow and quota visibility.

Running the RAG Agent from the Command Line

To test the new RAG Agent directly from the CLI, execute:

python rag_agent.py --quantize

Remove the --quantize flag if you prefer running without 4‑bit quantization.

Testing the New Features

  1. Verify Model Setup:

    • Confirm the selected model loads correctly by checking the terminal output for any errors.
  2. Document Retrieval:

    • Place your documents (PDF or text files) in the directory specified in the default parameters or provide your paths using the --docs flag.
    • The vector store is rebuilt every time the agent starts. Ensure your documents are well placed to retrieve relevant content.
  3. Question Handling:

    • After the agent starts, enter a sample coding-related question.
    • The assistant should respond by incorporating context from the loaded documents and answering your query.
  4. API and Docker Route:

    • Optionally, combine these changes by deploying the updated version via Docker and testing the FastAPI endpoints as described above.

Troubleshooting CUDA Memory Issues

If you encounter CUDA out-of-memory errors, consider running the agent on CPU or adjust CUDA settings:

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Review the terminal output for further details and error messages.

Using the Streamlit App

Sugar-AI also provides a Streamlit-based interface for quick interactions and visualizations.

Running the Streamlit App

  1. Install Streamlit:

    If you haven't already, install Streamlit:

    pip install streamlit
  2. Make sure server is running using:

    uvicorn main:app --host 0.0.0.0 --port 8000
  3. Start the App:

    Launch the Streamlit app by adding streamlit.py file.

    #./streamlit.py
    import streamlit as st
    import requests
    
    st.title("Sugar-AI Chat Interface")
    
    use_rag = st.checkbox("Use RAG (Retrieval-Augmented Generation)", value=True)
    
    st.subheader("Ask Sugar-AI")
    question = st.text_input("Enter your question:")
    
    if st.button("Submit"):
        if question:
            if use_rag:
                url = "http://localhost:8000/ask"
            else:
                url = "http://localhost:8000/ask-llm"
            params = {"question": question}
            try:
                response = requests.post(url, params=params)
                if response.status_code == 200:
                    result = response.json()
                    st.markdown("**Answer:** " + result["answer"])
                else:
                    st.error(f"Error {response.status_code}: {response.text}")
            except Exception as e:
                st.error(f"Error contacting the API: {e}")
        else:
            st.warning("Please enter a question.")
    streamlit run streamlit.py
  4. Using the App:

    • The app provides a simple UI to input coding questions and displays the response using Sugar-AI.
    • Use the sidebar options to configure settings if available.
    • The app communicates with the FastAPI backend to process and retrieve answers.

Streamlit UI

Enjoy exploring Sugar-AI through both API endpoints and the interactive Streamlit interface!

About

AI Module used by Sugar Activities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published