A TypeScript-based Model Context Protocol (MCP) server that provides local-first document management and semantic search using embeddings. The server exposes a collection of MCP tools and is optimized for performance with on-disk persistence, an in-memory index, and caching.
- O(1) Document lookup and keyword index through
DocumentIndex
for fast chunk and document retrieval. - LRU
EmbeddingCache
to avoid recomputing embeddings and speed up repeated queries. - Parallel chunking and batch processing to accelerate ingestion of large documents.
- Streaming file reader to process large files without high memory usage.
- Chunk-based semantic search with context-window retrieval to gather surrounding chunks for better LLM answers.
- Local-only storage: no external database required. All data resides in
~/.mcp-documentation-server/
.
Run directly with npx (recommended):
npx @andrea9293/mcp-documentation-server
Example configuration for an MCP client (e.g., Claude Desktop):
{
"mcpServers": {
"documentation": {
"command": "npx",
"args": [
"-y",
"@andrea9293/mcp-documentation-server"
],
"env": {
"MCP_EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
}
}
}
}
- Add documents using the
add_document
tool or by placing.txt
,.md
, or.pdf
files into the uploads folder and callingprocess_uploads
. - Search documents with
search_documents
to get ranked chunk hits. - Use
get_context_window
to fetch neighboring chunks and provide LLMs with richer context.
- Document management: add, list, retrieve, delete documents and metadata.
- Semantic search: chunk-level search using embeddings plus an in-memory keyword index.
DocumentIndex
: constant-time lookups for documents and chunks; supports deduplication and persisted index file.EmbeddingCache
: configurable LRU cache for embedding vectors to reduce recomputation and speed repeated requests.- Parallel and batch chunking: ingestion is parallelized for large documents to improve throughput.
- Streaming file processing: large files are processed in a streaming manner to avoid excessive memory usage.
- Context window retrieval: fetch N chunks before/after a hit to assemble full context for LLM prompts.
- Local-first persistence: documents and index are stored as JSON files under the user's data directory.
The server exposes several tools (validated with Zod schemas) for document lifecycle and search:
add_document
— Add a document (title, content, metadata)list_documents
— List stored documents and metadataget_document
— Retrieve a full document by iddelete_document
— Remove a document and its chunksprocess_uploads
— Convert files in uploads folder into documents (chunking + embeddings)get_uploads_path
— Returns the absolute uploads folder pathlist_uploads_files
— Lists files in uploads foldersearch_documents
— Semantic search within a document (returns chunk hits and LLM hint)get_context_window
— Return a window of chunks around a target chunk index
Configure behavior via environment variables. Important options:
MCP_EMBEDDING_MODEL
— embedding model name (default:Xenova/all-MiniLM-L6-v2
). Changing the model requires re-adding documents. (all feature extraction xenova models are here).MCP_INDEXING_ENABLED
— enable/disable theDocumentIndex
(true/false). Default:true
.MCP_CACHE_SIZE
— LRU embedding cache size (integer). Default:1000
.MCP_PARALLEL_ENABLED
— enable parallel chunking (true/false). Default:true
.MCP_MAX_WORKERS
— number of parallel workers for chunking/indexing. Default:4
.MCP_STREAMING_ENABLED
— enable streaming reads for large files. Default:true
.MCP_STREAM_CHUNK_SIZE
— streaming buffer size in bytes. Default:65536
(64KB).MCP_STREAM_FILE_SIZE_LIMIT
— threshold (bytes) to switch to streaming path. Default:10485760
(10MB).
Example .env
(defaults applied when variables are not set):
MCP_INDEXING_ENABLED=true # Enable O(1) indexing (default: true)
MCP_CACHE_SIZE=1000 # LRU cache size (default: 1000)
MCP_PARALLEL_ENABLED=true # Enable parallel processing (default: true)
MCP_MAX_WORKERS=4 # Parallel worker count (default: 4)
MCP_STREAMING_ENABLED=true # Enable streaming (default: true)
MCP_STREAM_CHUNK_SIZE=65536 # Stream chunk size (default: 64KB)
MCP_STREAM_FILE_SIZE_LIMIT=10485760 # Streaming threshold (default: 10MB)
Default storage layout (data directory):
~/.mcp-documentation-server/
├── data/ # Document JSON files
└── uploads/ # Drop files (.txt, .md, .pdf) to import
Add a document via MCP tool:
{
"tool": "add_document",
"arguments": {
"title": "Python Basics",
"content": "Python is a high-level programming language...",
"metadata": {
"category": "programming",
"tags": ["python", "tutorial"]
}
}
}
Search a document:
{
"tool": "search_documents",
"arguments": {
"document_id": "doc-123",
"query": "variable assignment",
"limit": 5
}
}
Fetch context window:
{
"tool": "get_context_window",
"arguments": {
"document_id": "doc-123",
"chunk_index": 5,
"before": 2,
"after": 2
}
}
- Embedding models are downloaded on first use; some models require several hundred MB of downloads.
- The
DocumentIndex
persists an index file and can be rebuilt if necessary. - The
EmbeddingCache
can be warmed by callingprocess_uploads
, issuing curated queries, or using a preload API when available.
Set via MCP_EMBEDDING_MODEL
environment variable:
Xenova/all-MiniLM-L6-v2
(default) - Fast, good quality (384 dimensions)Xenova/paraphrase-multilingual-mpnet-base-v2
(recommended) - Best quality, multilingual (768 dimensions)
The system automatically manages the correct embedding dimension for each model. Embedding providers expose their dimension via getDimensions()
.
git clone https://github.com/andrea9293/mcp-documentation-server.git
cd mcp-documentation-server
npm run dev
npm run build
npm run inspect
- Fork the repository
- Create a feature branch:
git checkout -b feature/name
- Follow Conventional Commits for messages
- Open a pull request
MIT - see LICENSE file
Built with FastMCP and TypeScript 🚀