An intelligent legal assistant that leverages Natural Language Processing (NLP), vector search, and open-source language models to provide legal guidance based on authoritative legal texts.
- 💬 User Input: The user submits a legal question or prompt through a simple interface.
- 📚 Semantic Search: The prompt is embedded and used to query a vector database containing semantic representations (embeddings) of a legal book.
- 📖 Contextual Retrieval: Relevant sections of the legal text are retrieved based on semantic similarity to the user's query.
- 🧠 AI-Powered Response: The original query and retrieved context are passed to
GPT-Neo 1.3B
, which generates a response grounded in legal context.
- 🧾 Text Embeddings: Converts legal content into high-dimensional vectors for semantic search.
- 📊 Vector Database: Efficient document storage and retrieval using FAISS, Pinecone, or similar tools.
- 🧠 GPT-Neo 1.3B: Open-source transformer model to generate legal insights.
- 🐍 Python: Core language for backend logic and system orchestration.
- 🧑🎓 Law Student Support: Quickly understand legal concepts and references.
- 🧾 Legal Research Assistant: Automate lookup of relevant sections from legal texts.
- 🧠 General Legal Literacy: Make legal knowledge more accessible to non-lawyers.
⚠️ Disclaimer: This tool provides AI-generated responses for educational and informational purposes only. It is not a substitute for professional legal advice.
rag-system/ ├── chunking/ # Document loading and chunking ├── embeddings/ # Embedding logic ├── vector_store/ # FAISS-based vector retrieval ├── prompts/ # Prompt templating ├── llm_interface/ # Local LLM loading (HuggingFace) ├── scripts/ # Main execution scripts ├── dataset/ │ └── raw/ # Place your .pdf files here ├── requirements.txt └── README.html
pip install -r requirements.txt
Put your .pdf
files into:
data/raw/
python scripts/run_rag.py
- Corpus Loader: Loads and chunks PDFs.
- Embeddings: Uses HuggingFace models like
all-MiniLM-L6-V2
. - Vector Store: FAISS-powered similarity search.
- Prompting: Dynamic prompt construction from retrieved docs.
- LLM Inference: Runs a local model (e.g.,
EleutherAI/gpt-neo-1.3B
) using Transformers pipeline.
You can replace:
- PDF loader with web or CSV input.
- FAISS with Chroma or other vector stores.
- LLM with OpenAI, Claude, LLaMA, etc.
This project is open-source and available under the MIT License.
Mohsin Raza