Research Assistant AI System

GenAI Labs BE/AI Candidate Challenge - Complete Implementation

An AI-powered research assistant with semantic search, content generation, and enterprise-grade authentication built with FastAPI and modern frontend technologies.

🚀 Quick Start

Prerequisites

Python 3.8+
Node.js 16+
OpenAI API Key
Pinecone Local (Docker)

1. Start Pinecone Local

# Start Pinecone Local container
docker run -p 8080:8080 pinecone/pinecone-local:latest

2. Backend Setup

# Navigate to backend directory
cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys:
# OPENAI_API_KEY=your_openai_key
# PINECONE_API_KEY=pclocal
# PINECONE_ENVIRONMENT=localhost:8080

# Clear database for fresh start (optional)
python clear_database.py

# Start the backend server
uvicorn main:app --host 0.0.0.0 --port 8006 --reload

3. Frontend Setup

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start the development server
npm run dev

4. Access the Application

Frontend: http://localhost:3001
Backend API: http://localhost:8006
API Documentation: http://localhost:8006/docs
Demo Video: demo_video/demo_video.mov

🔐 Authentication

The system includes enterprise-grade JWT authentication with role-based access control.

Demo Accounts

Username	Password	Role	Permissions
`admin`	`admin123`	Admin	Full access (upload, search, generate, analytics)
`researcher`	`research123`	User	Read + Generate (search, content generation, analytics)

Login Flow

Navigate to http://localhost:3001
You'll see a beautiful login screen (dashboard is hidden until authenticated)
Click "Admin Demo" or "Researcher Demo" for quick access
Or use the Login/Register buttons for custom authentication
Dashboard appears after successful authentication

📚 Features Implemented

✅ Core Requirements (100% Complete)

Document Upload & Processing - /api/upload
Semantic Search - /api/similarity_search
Document Retrieval - /api/{journal_id}

✅ Optional Stretch Features (5/5 - 100% Complete)

Frontend UI - Complete React-like interface with modern design
Content Generation APIs - Document summarization, paper comparison, research insights
Persistent Usage Tracking - Database analytics with usage statistics
Unit Tests - Comprehensive testing suite for all components
Authentication & Roles - JWT-based auth with role-based permissions

🛠 Architecture

Backend (FastAPI)

backend/
├── main.py                 # FastAPI application entry point
├── models.py              # Pydantic data models
├── config.py              # Configuration management
├── requirements.txt       # Python dependencies
├── services/
│   ├── vector_service.py     # Pinecone vector database operations
│   ├── embedding_service.py  # OpenAI embeddings generation
│   ├── rag_service.py       # RAG pipeline with content generation
│   └── auth_service.py      # JWT authentication & user management
└── tests/
    └── test_auth_service.py     # Authentication service unit tests (22/22 passing)

Integration Tests

integration_tests/
├── test_basic.py          # Basic functionality tests
├── test_api_basic.py      # API endpoint tests
├── test_upload.py         # Upload functionality tests
├── test_full_system.py    # End-to-end system tests
├── test_openai_connection.py # OpenAI API tests
└── test_api_endpoints.sh  # API testing script

Frontend (Modern Web Technologies)

frontend/
├── index.html            # Main HTML file
├── package.json          # Node.js dependencies
├── vite.config.js       # Vite build configuration
├── js/
│   ├── main.js             # Application initialization
│   ├── api.js              # API communication layer
│   ├── ui.js               # UI management & authentication
│   ├── analytics.js        # Analytics dashboard
│   └── config.js           # Frontend configuration
└── styles/
    ├── main.css            # Core styles & design system
    ├── components.css      # Component-specific styles
    └── responsive.css      # Mobile responsiveness

🔌 API Endpoints

Core Endpoints

PUT /api/upload - Upload and process document chunks
POST /api/similarity_search - Semantic search with AI responses
GET /api/{journal_id} - Retrieve specific document

Content Generation

POST /api/generate/summary - Generate document summaries
POST /api/generate/compare - Compare multiple papers
POST /api/generate/insights - Generate research insights

Authentication

POST /api/auth/login - User authentication
POST /api/auth/register - User registration
GET /api/auth/me - Get current user
PUT /api/protected/upload - Admin-only upload

Analytics

GET /api/analytics/usage - Usage statistics
GET /api/analytics/popular - Popular papers

📊 Content Generation Examples

Document Summarization

curl -X POST http://localhost:8006/api/generate/summary \
  -H "Content-Type: application/json" \
  -d '{
    "journal_ids": ["extension_brief_mucuna.pdf"],
    "summary_type": "abstract"
  }'

Paper Comparison

curl -X POST http://localhost:8006/api/generate/compare \
  -H "Content-Type: application/json" \
  -d '{
    "journal_ids": ["paper1.pdf", "paper2.pdf"],
    "comparison_aspects": ["methodology", "results"]
  }'

Research Insights

curl -X POST http://localhost:8006/api/generate/insights \
  -H "Content-Type: application/json" \
  -d '{
    "topic": "machine learning in agriculture",
    "time_range": "2020-2023",
    "insight_type": "trends"
  }'

🎬 Demo Materials

Demo Video & GIF

Video: demo_video/demo_video.mov - Complete demonstration
GIF: demo_video/demo.gif - Quick visual overview
Content: Complete demonstration of all required APIs from PDF challenge
Shows: Upload sample data, similarity search, document retrieval, frontend interface

Database Management

# Clear database for fresh demo (from backend directory)
cd backend && python clear_database.py

# This script will:
# - Connect to Pinecone Local
# - Show current vector count
# - Clear all vectors for clean demo start
# - Provide next steps for demo recording

🧪 Testing

Backend Unit Tests ✅

# Run backend unit tests (22/22 passing - 100% success rate)
cd backend && python -m pytest tests/ -v

# Run specific authentication tests
cd backend && python -m pytest tests/test_auth_service.py -v

Integration Tests ✅

# Basic functionality tests (no API keys needed)
python integration_tests/test_basic.py                    # 5/5 tests passed
python integration_tests/test_api_basic.py                # 4/4 tests passed

# API connection tests (requires API keys)
python integration_tests/test_openai_connection.py        # 2/2 tests passed

# Full system tests (requires all services)
python integration_tests/test_full_system.py              # Complete system test passed
python integration_tests/test_upload.py                   # Upload workflow test

# Live API endpoint tests (requires running server)
bash integration_tests/test_api_endpoints.sh              # All endpoints working

Test Results Summary ✅

Backend Unit Tests: 22/22 passing (100% success rate)
Integration Tests: 5 test files, all working perfectly
API Endpoints: 8+ endpoints tested and working with live server
System Components: Vector DB, OpenAI, RAG pipeline all validated
Professional Structure: Clean separation of unit vs integration tests

Test Coverage

Backend Unit Tests: Authentication service (JWT, passwords, permissions)
Integration Tests: API endpoints, upload workflows, system components
End-to-End Tests: Complete system functionality with real APIs
Live API Tests: All endpoints working with running server
Service Tests: Vector DB, embeddings, RAG pipeline, health checks

🛡 Security Features

JWT Authentication with 30-minute token expiration
Password Hashing using bcrypt
Role-Based Access Control (Admin vs User permissions)
CORS Protection with configurable origins
Input Validation using Pydantic models
Authentication Guards protecting frontend routes

📈 Analytics Dashboard

The system includes a real-time analytics dashboard showing:

Document indexing statistics
Search performance metrics
System health monitoring
Usage tracking and popular papers

🎨 UI/UX Features

Modern Design with gradient themes and smooth animations
Responsive Layout supporting desktop, tablet, and mobile
Dark Mode design with professional styling
Loading States with animated progress indicators
Error Handling with user-friendly messages
Accessibility with proper ARIA labels and keyboard navigation

🔧 Configuration

Backend Configuration (`backend/.env`)

OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX_NAME=research-assistant
DEBUG=true

Frontend Configuration (`frontend/js/config.js`)

const CONFIG = {
  API_BASE_URL: 'http://localhost:8006',
  DEFAULT_SEARCH: {
    k: 10,
    min_score: 0.25
  }
};

📄 License

This project is licensed under the MIT License.

👥 Authors

GenAI Labs Candidate - Full Stack Implementation - Research Assistant AI System

Built for GenAI Labs BE/AI Candidate Challenge | All 5 Optional Stretch Features Completed ✅

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
backend		backend
demo_video		demo_video
docs		docs
frontend		frontend
integration_tests		integration_tests
.gitignore		.gitignore
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
START_SERVER.md		START_SERVER.md
Sample_chunks.json		Sample_chunks.json
TECHNICAL_WRITEUP_FOR_DB.md		TECHNICAL_WRITEUP_FOR_DB.md
_GAL Cite Me If You Can Challenge BE_AI.pdf		_GAL Cite Me If You Can Challenge BE_AI.pdf
setup_pinecone.md		setup_pinecone.md
time_estimates.csv		time_estimates.csv

Deadsec69/research-assistant-ai-challenge

Folders and files

Latest commit

History

Repository files navigation