This project is inspired by Matthew Berman on his YouTube channel today. He noticed a big need for better optimization and cost-saving in AI development. AI product companies and large organizations are taking a 'naive approach' by relying solely on large language models (LLMs) from providers like OpenAI, but this can be really expensive and lacks flexibility.
His idea, Abstract AI, serves as an abstraction layer on top of large language models. It tackles the inefficiencies and high costs of using a single, often expensive, model for all tasks all from a single drop in API endpoint. By smartly routing prompts to the best models, Abstract AI aims to lower latency, cut costs, and offer more flexibility.
AI developers are currently facing several challenges:
- Overpaying for Premium Models: Companies often use high-cost models like GPT-4o even when simpler models could do the job.
- Platform Risk: Relying on a single provider exposes them to changes in policies or pricing.
- High Latency and Costs: Using the most advanced models for all tasks results in slow and expensive operations.
- Underutilization of Algorithmic Techniques: Many don't use advanced techniques like Chain of Thought or Mixture of Agents, leading to inefficient LLM usage.
Abstract AI addresses these issues by providing a single, drop-in API that connects to multiple large language models, including both open-source and proprietary models. It uses RouteLLM to find the best model for each prompt based on cost, speed, and quality.
- Multi-Model Routing: Uses RouteLLM to route prompts to the best-suited model, whether it’s a high-powered proprietary model or a smaller, local model.
- Cost Efficiency: Cuts costs by up to 80% while maintaining 90% of the quality of top-tier models.
- Flexibility: Supports multiple models, providing flexibility and reducing platform risk.
- Built-In Caching: Speeds up response times and reduces costs by caching frequent queries.
- User Management: Securely handles API keys and user permissions.
- Prompt Management: Tracks and versions prompts for better management and optimization.
abstract_ai/
│
├── app.py # Main Flask application entry point
├── route_prompt.py # Routes prompts to the appropriate language model using RouteLLM
├── cache.py # Implements a simple caching mechanism for responses
├── user_management.py # Handles user API key encryption and validation
├── prompt_manager.py # Manages saving and retrieving user prompts
├── generate_encryption_key.py # Script to generate an encryption key for securing API keys
├── requirements.txt # Lists the required Python packages for the project
└── README.md # Project overview, setup instructions, and documentation
-
Clone the Repository:
git clone https://github.com/laodev1/abstract_ai.git cd abstract_ai
-
Set Up the Virtual Environment:
python -m venv venv source venv/bin/activate
-
Install Dependencies:
pip install -r requirements.txt
-
Generate and Set the Encryption Key:
python generate_encryption_key.py
Copy the printed key and set it as an environment variable:
export ENCRYPTION_KEY="your_generated_key"
-
Set the OPENAI_API_KEY Environment Variable: Grab your OpenAI API keey and set it as an environment variable:
export OPENAI_API_KEY="your_openai_key"
This code uses ollama to run llama3.1 locally as per these docs. If you want to use any other models, simply change it in route_prompt.py
:
controller = Controller(
routers=["mf"],
strong_model="gpt-4o-mini",
weak_model="ollama_chat/llama3.1" # change me
)
- Run the Flask Application:
python app.py
-
Test the
/route_prompt
endpoint:curl -X POST http://localhost:5000/route_prompt \ -H "Content-Type: application/json" \ -H "x-api-key: api_key_user1" \ -d '{"prompt": "Hello, how are you?", "user_id": "user1"}'
-
Test the
/prompts
endpoint:curl -X GET "http://localhost:5000/prompts?user_id=user1" \ -H "x-api-key: api_key_user1"
To expose your local server over HTTPS with ngrok
:
- For first time use, set your auth token from ngrok (you'll only have to do this once):
ngrok config add-authtoken <TOKEN>
-
Start
ngrok
:ngrok http 5000
-
Access the Secure URL:
ngrok
will provide you with a secure HTTPS URL which you can use to access your application remotely.
- Endpoint:
/route_prompt
- Method: POST
- Headers:
x-api-key
: Your API key // Note: this is hardcoded as 'api_key_user1' inside user_management.py for demo purposes only.
- Request Body:
{ "prompt": "Your prompt here", "user_id": "user1" }
- Response:
Using NGROK for HTTPS:
{ "response": "AI response" }
- Endpoint:
/prompts
- Method: GET
- Headers:
x-api-key
: Your API key // Note: this is hardcoded as 'api_key_user1' inside user_management.py for demo purposes only.
- Query Params:
user_id
: The user ID
- Response:
{ "prompts": ["Prompt 1", "Prompt 2"] }