This is a tiny, fully local Retrieval-Augmented QA system you can run in a few minutes. It demonstrates the full RAG loop using free components only.
- Ingest: Read PDFs/Markdown/TXT → split into overlapping chunks (to keep context).
- Embed: Convert chunks to vectors with a lightweight model (
all-MiniLM-L6-v2
). - Index: Store vectors in FAISS (fast vector search) + parquet metadata.
- Query: User question → embed → retrieve top-k similar chunks.
- Answer: Show evidence chunks and a simple extractive "answer" assembled from top hits.
Abstractive answering (a generated summary) can be plugged in later with a small local model.
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
# .venv\Scripts\Activate.ps1
pip install -r requirements.txt
streamlit run app.py
Then in the UI:
- Upload 1+ documents (or place them in
data/source/
). - Click Build / Refresh Index.
- Ask a question and inspect the Evidence + Answer sections.
- Cosine similarity via Inner Product: Normalize embeddings and use FAISS's
IndexFlatIP
. - Chunking: Default
chunk_size=800
,overlap=120
. Adjust for your content density. - Local-first: No paid APIs. Everything runs on CPU.
rag-docs/
├─ app.py # Streamlit UI: upload → build index → query
├─ ingest.py # Ingestion pipeline (read/clean → split → embed → index)
├─ rag_core/
│ ├─ splitter.py # Chunking utilities (with comments)
│ ├─ embedder.py # Wrapper around SentenceTransformer (normalized vectors)
│ ├─ index.py # FAISS + metadata persistence
│ └─ answer.py # Naive extractive answer composer
├─ data/source/ # Put your PDFs/MD/TXT here (UI saves uploads here too)
└─ data/store/ # FAISS index + Parquet metadata live here
- Add abstractive summarization (e.g.,
google/flan-t5-small
) over retrieved chunks. - Highlight/cite exact phrases that matched the query.
- Add OCR for scanned PDFs (Tesseract) if needed.