MCP Documentation Server

A TypeScript-based Model Context Protocol (MCP) server that provides local-first document management and semantic search using embeddings. The server exposes a collection of MCP tools and is optimized for performance with on-disk persistence, an in-memory index, and caching.

Demo Video

Core capabilities

O(1) Document lookup and keyword index through DocumentIndex for fast chunk and document retrieval.
LRU EmbeddingCache to avoid recomputing embeddings and speed up repeated queries.
Parallel chunking and batch processing to accelerate ingestion of large documents.
Streaming file reader to process large files without high memory usage.
Chunk-based semantic search with context-window retrieval to gather surrounding chunks for better LLM answers.
Local-only storage: no external database required. All data resides in ~/.mcp-documentation-server/.

Quick Start

Install and run

Run directly with npx (recommended):

npx @andrea9293/mcp-documentation-server

Configure an MCP client

Example configuration for an MCP client (e.g., Claude Desktop):

{
  "mcpServers": {
    "documentation": {
      "command": "npx",
      "args": [
        "-y",
        "@andrea9293/mcp-documentation-server"
      ],
      "env": {
        "MCP_EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
      }
    }
  }
}

Basic workflow

Add documents using the add_document tool or by placing .txt, .md, or .pdf files into the uploads folder and calling process_uploads.
Search documents with search_documents to get ranked chunk hits.
Use get_context_window to fetch neighboring chunks and provide LLMs with richer context.

Features

Document management: add, list, retrieve, delete documents and metadata.
Semantic search: chunk-level search using embeddings plus an in-memory keyword index.
DocumentIndex: constant-time lookups for documents and chunks; supports deduplication and persisted index file.
EmbeddingCache: configurable LRU cache for embedding vectors to reduce recomputation and speed repeated requests.
Parallel and batch chunking: ingestion is parallelized for large documents to improve throughput.
Streaming file processing: large files are processed in a streaming manner to avoid excessive memory usage.
Context window retrieval: fetch N chunks before/after a hit to assemble full context for LLM prompts.
Local-first persistence: documents and index are stored as JSON files under the user's data directory.

Exposed MCP tools

The server exposes several tools (validated with Zod schemas) for document lifecycle and search:

add_document — Add a document (title, content, metadata)
list_documents — List stored documents and metadata
get_document — Retrieve a full document by id
delete_document — Remove a document and its chunks
process_uploads — Convert files in uploads folder into documents (chunking + embeddings)
get_uploads_path — Returns the absolute uploads folder path
list_uploads_files — Lists files in uploads folder
search_documents — Semantic search within a document (returns chunk hits and LLM hint)
get_context_window — Return a window of chunks around a target chunk index

Configuration & environment variables

Configure behavior via environment variables. Important options:

MCP_EMBEDDING_MODEL — embedding model name (default: Xenova/all-MiniLM-L6-v2). Changing the model requires re-adding documents. (all feature extraction xenova models are here).
MCP_INDEXING_ENABLED — enable/disable the DocumentIndex (true/false). Default: true.
MCP_CACHE_SIZE — LRU embedding cache size (integer). Default: 1000.
MCP_PARALLEL_ENABLED — enable parallel chunking (true/false). Default: true.
MCP_MAX_WORKERS — number of parallel workers for chunking/indexing. Default: 4.
MCP_STREAMING_ENABLED — enable streaming reads for large files. Default: true.
MCP_STREAM_CHUNK_SIZE — streaming buffer size in bytes. Default: 65536 (64KB).
MCP_STREAM_FILE_SIZE_LIMIT — threshold (bytes) to switch to streaming path. Default: 10485760 (10MB).

Example .env (defaults applied when variables are not set):

MCP_INDEXING_ENABLED=true          # Enable O(1) indexing (default: true)
MCP_CACHE_SIZE=1000                # LRU cache size (default: 1000)
MCP_PARALLEL_ENABLED=true          # Enable parallel processing (default: true)
MCP_MAX_WORKERS=4                  # Parallel worker count (default: 4)
MCP_STREAMING_ENABLED=true         # Enable streaming (default: true)
MCP_STREAM_CHUNK_SIZE=65536        # Stream chunk size (default: 64KB)
MCP_STREAM_FILE_SIZE_LIMIT=10485760 # Streaming threshold (default: 10MB)

Default storage layout (data directory):

~/.mcp-documentation-server/
├── data/      # Document JSON files
└── uploads/   # Drop files (.txt, .md, .pdf) to import

Usage examples

Add a document via MCP tool:

{
  "tool": "add_document",
  "arguments": {
    "title": "Python Basics",
    "content": "Python is a high-level programming language...",
    "metadata": {
      "category": "programming",
      "tags": ["python", "tutorial"]
    }
  }
}

Search a document:

{
  "tool": "search_documents",
  "arguments": {
    "document_id": "doc-123",
    "query": "variable assignment",
    "limit": 5
  }
}

Fetch context window:

{
  "tool": "get_context_window",
  "arguments": {
    "document_id": "doc-123",
    "chunk_index": 5,
    "before": 2,
    "after": 2
  }
}

Performance and operational notes

Embedding models are downloaded on first use; some models require several hundred MB of downloads.
The DocumentIndex persists an index file and can be rebuilt if necessary.
The EmbeddingCache can be warmed by calling process_uploads, issuing curated queries, or using a preload API when available.

Embedding Models

Set via MCP_EMBEDDING_MODEL environment variable:

Xenova/all-MiniLM-L6-v2 (default) - Fast, good quality (384 dimensions)
Xenova/paraphrase-multilingual-mpnet-base-v2 (recommended) - Best quality, multilingual (768 dimensions)

The system automatically manages the correct embedding dimension for each model. Embedding providers expose their dimension via getDimensions().

⚠️ Important: Changing models requires re-adding all documents as embeddings are incompatible.

Development

git clone https://github.com/andrea9293/mcp-documentation-server.git

cd mcp-documentation-server

npm run dev

npm run build

npm run inspect

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/name
Follow Conventional Commits for messages
Open a pull request

License

MIT - see LICENSE file

Support

Star History

Built with FastMCP and TypeScript 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github		.github
docs		docs
src		src
.gitignore		.gitignore
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SCALABILITY_ANALYSIS.md		SCALABILITY_ANALYSIS.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

MCP Documentation Server

Demo Video

Core capabilities

Quick Start

Install and run

Configure an MCP client

Basic workflow

Features

Exposed MCP tools

Configuration & environment variables

Usage examples

Performance and operational notes

Embedding Models

Development

Contributing

License

Support

Star History

About

Uh oh!

Releases 12

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 4

Languages

Uh oh!

License

andrea9293/mcp-documentation-server

Folders and files

Latest commit

History

Repository files navigation

MCP Documentation Server

Demo Video

Core capabilities

Quick Start

Install and run

Configure an MCP client

Basic workflow

Features

Exposed MCP tools

Configuration & environment variables

Usage examples

Performance and operational notes

Embedding Models

Development

Contributing

License

Support

Star History

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 4

Languages

Packages