Building a Local Knowledge Base MCP Server: When Your Documentation Won’t Fit in Context
The Problem
I wanted to chat with Claude/Gemini about my local technical notes, but they were too large to fit in conversation context. Thousands of pages of AWS notes, system design docs. Copy-pasting wasn’t practical, and uploading to external services felt wrong.
Solution? Build an MCP (Model Context Protocol) server that lets AI assistants query my local docs on-demand.
The Over-Engineering Phase 🏗️
Went full “production mode”: query enhancement, cross-encoder re-ranking, semantic chunking, multi-signal scoring (Standard RAG → Corrective RAG → Fusion RAG). 500 lines across 8 modules.
Why the complexity? I wanted to learn different RAG patterns hands-on. But surely more sophisticated = better results, right?
The Reality Check
Then I measured with real queries:
- R@5: 80% (failed on “AWS Aurora failover time”)
- Latency: 381ms (sluggish)
- Code: Complex, hard to debug
The Pivot
Stripped to essentials: Hybrid Search (BM25 keyword + Vector semantic). Fixed-size chunking. Simple fusion (30% keywords, 70% semantics). 200 lines total.
Results:
- R@5: 100% (every query found the right document)
- Latency: 23ms (16x faster)
- Maintainability: Actually understandable
Key Learnings:
- Standard RAG (vector only) - missed exact keywords
- Corrective RAG (cross-encoder) - 380ms added, zero accuracy gain
- Fusion RAG (BM25+Vector) -> The Winner! Caught both keywords AND semantics
- Semantic chunking - overcomplicated, fixed-size worked fine
- Query enhancement - diluted keywords, reduced accuracy
- Graph RAG - didn’t even explore it; hybrid search solved the problem first
Why This Matters for Enterprise ?
This MCP approach could transform internal documentation + AI:
- Privacy-First: Docs stay on-premises, AI queries locally
- Context-On-Demand: Only retrieve relevant chunks, avoid token limits
- Vendor-Agnostic: Works with Claude, GPT, Gemini, future models
- Agent-Ready: When agentic AI becomes reliable, agents can search internal confluence, wikis, SOPs without exposing full corpus
Picture: Your AI assistant querying architecture docs, security policies, incident reports through a controlled MCP interface. Full audit trail, works with any MCP-supporting vendor.
The Real Lesson
Measure everything. My intuition said “complex = better.” Data said “simple hybrid = perfect accuracy + 16x faster.”
For RAG: Start simple, test ruthlessly, add complexity only when metrics demand it.
Built with: FastMCP, ChromaDB, LangChain, sentence-transformers, rank-bm25.
https://github.com/j3r3myfoobar/knowledge_base_mcp