Building a Local Knowledge Base MCP Server: When Your Documentation Won’t Fit in Context

The Problem

I wanted to chat with Claude/Gemini about my local technical notes, but they were too large to fit in conversation context. Thousands of pages of AWS notes, system design docs. Copy-pasting wasn’t practical, and uploading to external services felt wrong.

Solution? Build an MCP (Model Context Protocol) server that lets AI assistants query my local docs on-demand.

The Over-Engineering Phase 🏗️

Went full “production mode”: query enhancement, cross-encoder re-ranking, semantic chunking, multi-signal scoring (Standard RAG → Corrective RAG → Fusion RAG). 500 lines across 8 modules.

Why the complexity? I wanted to learn different RAG patterns hands-on. But surely more sophisticated = better results, right?

The Reality Check

Then I measured with real queries:

  • R@5: 80% (failed on “AWS Aurora failover time”)
  • Latency: 381ms (sluggish)
  • Code: Complex, hard to debug

The Pivot

Stripped to essentials: Hybrid Search (BM25 keyword + Vector semantic). Fixed-size chunking. Simple fusion (30% keywords, 70% semantics). 200 lines total.

Results:

  • R@5: 100% (every query found the right document)
  • Latency: 23ms (16x faster)
  • Maintainability: Actually understandable

Key Learnings:

  • Standard RAG (vector only) - missed exact keywords
  • Corrective RAG (cross-encoder) - 380ms added, zero accuracy gain
  • Fusion RAG (BM25+Vector) -> The Winner! Caught both keywords AND semantics
  • Semantic chunking - overcomplicated, fixed-size worked fine
  • Query enhancement - diluted keywords, reduced accuracy
  • Graph RAG - didn’t even explore it; hybrid search solved the problem first

Why This Matters for Enterprise ?

This MCP approach could transform internal documentation + AI:

  1. Privacy-First: Docs stay on-premises, AI queries locally
  2. Context-On-Demand: Only retrieve relevant chunks, avoid token limits
  3. Vendor-Agnostic: Works with Claude, GPT, Gemini, future models
  4. Agent-Ready: When agentic AI becomes reliable, agents can search internal confluence, wikis, SOPs without exposing full corpus

Picture: Your AI assistant querying architecture docs, security policies, incident reports through a controlled MCP interface. Full audit trail, works with any MCP-supporting vendor.

The Real Lesson

Measure everything. My intuition said “complex = better.” Data said “simple hybrid = perfect accuracy + 16x faster.”

For RAG: Start simple, test ruthlessly, add complexity only when metrics demand it.

Built with: FastMCP, ChromaDB, LangChain, sentence-transformers, rank-bm25.

https://github.com/j3r3myfoobar/knowledge_base_mcp

Updated: