Skip to content

Conversation

@GeekRicardo
Copy link

Description

This PR enhances the OpenAI-compatible API server by adding model-specific prompt formatting with full conversation history support. Different AI models require different prompt structures and special tokens to function correctly, especially when handling multi-turn conversations. This change ensures that prompts are properly formatted based on the model being used while preserving the entire conversation context.

Changes

Added model-specific formatting functions with context support:

  • formatLlama3() - Handles multi-turn conversations with <|begin_of_text|>, <|start_header_id|> tokens
  • formatLlama2() - Preserves conversation history using [INST], <<SYS>> tokens
  • formatMistral() - Maintains context across multiple user/assistant interactions
  • formatClaude() - Formats entire conversation history in Claude's conversational style
  • formatGrok() - Supports system instructions and conversation context
  • formatSimpleChat() - Fallback that preserves all messages for OpenAI/Gemini
  • formatDefault() - Generic format maintaining full message history

Main function:

  • formatPromptForModel() - Processes entire message array while preserving conversation flow

Key Features

🔄 Full Conversation History Support

  • Processes entire message arrays (system, user, assistant messages)
  • Maintains proper conversation flow and context
  • Preserves message order and role information

🎯 Model-Specific Context Handling

Each model family has unique requirements for handling conversation history:

  • Llama models: Properly chains messages with end-of-turn tokens
  • Claude: Maintains conversational format with clear role prefixes
  • Mistral: Uses instruction tags to separate conversation turns
  • System messages: Handled according to each model's requirements

Motivation

When using the OpenAI-compatible API endpoint with different models through Raycast AI, maintaining conversation context is crucial. Without proper formatting:

  • Models lose track of previous interactions
  • System instructions may be ignored or misplaced
  • Multi-turn conversations produce inconsistent results
  • Context windows are not utilized effectively

Testing

Tested with multi-turn conversations on:

  • Llama 2/3/3.1 with system prompts and conversation history
  • Mistral/Codestral with multiple user/assistant exchanges
  • Claude with system instructions and long conversations
  • OpenAI format with full message history
  • Generic fallback maintaining all context

Example

Before this change:

// Only last message or simple concatenation
const prompt = messages[messages.length - 1].content;

After this change:

// Full conversation with proper formatting
const messages = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is Python?' },
  { role: 'assistant', content: 'Python is a programming language...' },
  { role: 'user', content: 'Can you give me an example?' }
];

const prompt = formatPromptForModel(messages, "claude");
// Returns properly formatted conversation with full context:
// You are a helpful assistant.
// 
// User: What is Python?
// 
// Assistant: Python is a programming language...
// 
// User: Can you give me an example?
// 
// Assistant:

Breaking Changes

None. This change is backwards compatible and only enhances the existing functionality.

Benefits

✅ Better context understanding: Models can reference previous messages
✅ Improved response quality: Full conversation context leads to more coherent responses
✅ System instruction support: Properly handles system messages for each model
✅ Multi-turn conversations: Enables complex, contextual interactions
✅ Model compatibility: Each model receives its optimal format

Related Issues

Improves model compatibility for OpenAI-compatible API usage
Addresses prompt formatting issues with conversation history
Enables proper context handling for multi-turn conversations

Checklist

  • Code follows the project's style guidelines
  • All comments are in English
  • Functions are properly documented with JSDoc
  • Handles full message arrays with conversation history
  • Preserves context across multiple conversation turns
  • No breaking changes introduced
  • Tested with multi-turn conversations on multiple model types

Disclosure

This code enhancement was developed with AI assistance. The PR description and documentation were also generated using AI to ensure comprehensive coverage of the changes and their implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant