Model Context Protocol: Connecting LLMs to External Tools
Large language models are starting to feel less like isolated black boxes and more like real software components. The main reason is simple: we are getting better at connecting them to tools, data, and other services in a structured way.
Model Context Protocol (MCP) is one of the more promising efforts here. Instead of every vendor inventing a slightly different way to do "tools" or "function calling," MCP tries to standardize how an LLM can talk to external systems.
If you are already building RAG systems, integrating vector databases, or orchestrating agents, MCP gives you a cleaner way to expose all of that to an LLM client.
What is Model Context Protocol, really?
At its core, Model Context Protocol is a protocol for describing and calling external capabilities in a way that LLMs and clients can understand.
Instead of "here is a random HTTP endpoint, good luck," MCP gives you:
- A standard way to describe tools and resources
- A standard way to call them and stream responses
- A standard way for clients (IDEs, UIs, agents) to discover what is available
You can think of it as:
- For LLMs: a typed toolbox and filesystem
- For tools: a contract to be callable by any MCP-compatible client
- For developers: a common language for connecting LLMs to everything else
This sits nicely next to concepts from Retrieval-Augmented Generation and Hybrid Search. Instead of hard-wiring your RAG retrieval flow into a single app, you can expose your retrieval, ranking, and post-processing as MCP tools and resources.
Core ideas: tools, resources, prompts
MCP introduces a few central abstractions.
Tools
Tools are operations the model can call. Conceptually similar to OpenAI function calling or LangChain tools, but standardized.
A tool has:
- A name
- A JSON schema for its arguments
- A description of what it does
- A way to stream back a result
Example tool concepts in a RAG-heavy system:
semantic_search- run dense retrieval against your vector DBkeyword_search- use a traditional search indexrun_sql- query a data warehousesend_email,create_jira_ticket- take real-world actions
Resources
Resources are like files or documents that the model can read. They might be static or dynamic.
Examples:
- Configuration files
- Knowledge base documents
- User profile settings
- Logs or monitoring data
A resource can usually be:
- Listed (discoverability)
- Read by path or ID
- Sometimes watched for updates
Prompts
Some MCP implementations also expose reusable prompt templates as first-class objects.
If you have standardized system prompts for:
- RAG answer style
- Privacy-aware redaction
- Evaluation instructions
You can register these as prompts the client can fetch and apply instead of pasting them manually everywhere.
Where MCP fits in your architecture
Before writing any code, it helps to see where MCP sits relative to your existing stack.
Imagine a typical production RAG architecture:
- Vector database for dense retrieval
- Sometimes a keyword or BM25 index for sparse retrieval
- Custom pre-processing and chunking logic
- One or more LLMs (hosted or self-hosted)
- A web app or API layer on top
MCP fits between:
- The client (IDE, chat UI, agent framework)
- The services/tools you already have
Instead of your chat UI directly knowing how to talk to your vector DB, databases, monitoring endpoints, etc, the UI just knows how to talk MCP.
The benefits:
- Tools become reusable between different clients
- Less glue code per client
- Easier to expose new capabilities without UI changes
Implementing a simple MCP server in Python
Let us walk through a minimal MCP-style server that exposes two tools:
search_docs- proxy to a semantic search endpointget_user_profile- a simple key-value store
I will keep the transport generic and focus on the data structures and flows, since actual MCP implementations often sit on top of JSON-RPC over stdio, WebSockets, or HTTP.
Defining tools with JSON schema
We will define tools using Python dataclasses for type safety, then make them JSON serializable so clients can inspect them.
from dataclasses import dataclass, asdict
from typing import Any, Dict, List
@dataclass
class ToolSchema:
name: str
description: str
parameters: Dict[str, Any] # JSON Schema
search_docs_tool = ToolSchema(
name="search_docs",
description="Semantic search over internal documentation.",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "User query"},
"top_k": {"type": "integer", "default": 5, "minimum": 1, "maximum": 20},
},
"required": ["query"],
"additionalProperties": False,
},
)
get_user_profile_tool = ToolSchema(
name="get_user_profile",
description="Get the current user's profile information.",
parameters={
"type": "object",
"properties": {
"user_id": {"type": "string"},
},
"required": ["user_id"],
"additionalProperties": False,
},
)
TOOLS: Dict[str, ToolSchema] = {
t.name: t for t in [search_docs_tool, get_user_profile_tool]
}
def list_tools() -> List[Dict[str, Any]]:
"""Return tool descriptions for clients to discover."""
return [asdict(tool) for tool in TOOLS.values()]
This gives us a discoverable interface. Any MCP client can ask the server: "what tools do you support" and get a machine readable answer.
Wiring tools to actual logic
Next, we connect those tools to real Python functions.
from typing import Callable
def semantic_search_backend(query: str, top_k: int = 5) -> List[Dict[str, Any]]:
# Dummy response for illustration
return [
{"id": "doc_1", "score": 0.92, "snippet": "..."},
{"id": "doc_2", "score": 0.88, "snippet": "..."},
][:top_k]
USER_PROFILES = {
"user_123": {"name": "Alice", "role": "engineer"},
"user_456": {"name": "Bob", "role": "data_scientist"},
}
def handle_search_docs(args: Dict[str, Any]) -> Dict[str, Any]:
query = args["query"]
top_k = args.get("top_k", 5)
results = semantic_search_backend(query, top_k)
return {"results": results}
def handle_get_user_profile(args: Dict[str, Any]) -> Dict[str, Any]:
user_id = args["user_id"]
profile = USER_PROFILES.get(user_id)
if profile is None:
return {"error": f"User {user_id} not found"}
return {"profile": profile}
TOOL_HANDLERS: Dict[str, Callable[[Dict[str, Any]], Dict[str, Any]]] = {
"search_docs": handle_search_docs,
"get_user_profile": handle_get_user_profile,
}
def call_tool(name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
if name not in TOOL_HANDLERS:
raise ValueError(f"Unknown tool: {name}")
handler = TOOL_HANDLERS[name]
return handler(arguments)
The MCP contract here is simple: once the model chooses a tool and provides JSON arguments, the server runs the corresponding Python function and streams the result back.
Simple JSON-RPC transport example
To keep the example concrete, here is a minimal HTTP JSON-RPC style interface using FastAPI.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
class RpcRequest(BaseModel):
method: str
params: Dict[str, Any] | None = None
@app.get("/tools")
async def http_list_tools():
return {"tools": list_tools()}
@app.post("/call")
async def http_call_tool(req: RpcRequest):
if req.method not in TOOL_HANDLERS:
raise HTTPException(status_code=404, detail="Unknown tool")
try:
result = call_tool(req.method, req.params or {})
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
return {"result": result}
This is not a full MCP reference implementation, but it shows the core building blocks:
- Tools are discoverable
- Arguments are validated by schema
- Calls and responses are structured and machine readable
Using MCP for RAG and tool-augmented LLMs
Much of the value of MCP appears when you start composing it with Retrieval-Augmented Generation and agentic patterns.
Exposing RAG as tools and resources
If you have a RAG pipeline, you can split it into:
- Retrieval tools
dense_searchusing embeddingssparse_searchusing BM25 or keyword matching
- Reranking tools
rerank_passageswith a cross encoder
- Post-processing resources
rag_configdescribing chunk sizes, similarity thresholds, etc
The LLM client can then:
- Inspect available tools
- Decide which retrieval strategy is appropriate
- Call the right tool
- Use the results as context for its response
This mirrors the agentic RAG idea, where the model actively decides how and when to retrieve, but with a standard interface that any MCP-aware client could orchestrate.
Orchestrating multiple services with MCP
When you start combining:
- RAG tools
- Monitoring tools (like
get_model_metrics) - Privacy tools (like
redact_pii)
You end up with a growing toolbox that agents or LLMs can use.
An MCP-compatible multi-agent system could decide:
- When a query touches sensitive data
- Call
redact_piibefore sending anything to a third party LLM - Call
semantic_search_internalfor regulated documents - Log decisions via a
log_audit_eventtool
Instead of writing a new custom integration each time, you just register new tools.
Designing good MCP tools
Just like with function calling and classic API design, the structure of your tools matters a lot.
Keep tools small and focused
Avoid tools that do a dozen things based on a mode argument. It is much easier for a model to choose between:
get_user_profileget_user_permissionsupdate_user_settings
Than a single monster tool user_operation with a type field.
Use clear, realistic descriptions
Tool descriptions are consumed by the LLM. They should:
- State what the tool does
- Mention limitations
- Mention latency or cost if relevant (for example, "expensive, use sparingly")
Example:
cleanup_logs_tool = ToolSchema(
name="cleanup_logs",
description=(
"Delete old application logs older than the specified number of days. "
"Irreversible, use only when user explicitly requests log cleanup."
),
parameters={
"type": "object",
"properties": {
"days": {"type": "integer", "minimum": 1, "maximum": 365},
},
"required": ["days"],
},
)
Think about privacy and compliance
When exposing tools to LLMs, be explicit about data sensitivity:
- Make some tools "safe by construction," for example
search_public_docsvssearch_sensitive_docs - Add guardrails inside tools, not just in the LLM prompt
- Consider using dedicated tools to anonymize or tokenize data before passing it to the LLM
You can also expose privacy policies as MCP resources, so the model can reference them when answering questions.
Testing and evaluating MCP-powered systems
Once you expose many capabilities via MCP, you need systematic evaluation.
Unit test tools directly
Since tools are regular Python functions, you can write straightforward tests:
def test_search_docs_returns_results():
result = handle_search_docs({"query": "data privacy", "top_k": 3})
assert "results" in result
assert len(result["results"]) <= 3
This is simpler than mocking entire LLM conversations. For higher-level behavior, you can run scenario-based tests where the LLM is expected to pick the right tools.
Log and analyze tool usage
For production, you should log:
- Which tools are called
- Argument distributions
- Error rates
- Latencies
This integrates naturally with your existing monitoring and CI/CD pipelines.
It also gives you concrete feedback about which capabilities the LLM struggles to use, which tools are redundant, and where latency is hurting UX.
Where MCP is heading and how to prepare
MCP is still evolving, but the direction is clear: more standardized, composable LLM tooling.
To future proof your system design:
- Treat your tools as first-class APIs with stable contracts
- Separate "business logic" from "LLM orchestration logic"
- Prefer schemas and explicit typing over ad hoc JSON blobs
If you are already comfortable with agent frameworks and tool-calling patterns, you are well positioned. MCP is less about a new theoretical idea and more about cleaning up the messy edges between LLMs and the rest of your stack.
Once you start thinking in terms of "capabilities exposed via MCP" rather than "hard coded chains inside a given app," your architecture diagrams become much more modular.
Key Takeaways
- Model Context Protocol standardizes how LLMs discover and call external tools, resources, and prompts.
- Tools in MCP are small, focused operations with JSON schemas, similar to function calling but vendor neutral.
- Resources act like a virtual filesystem that models can read from, including configs, documents, and policies.
- MCP fits naturally with RAG, letting you expose dense, sparse, and hybrid retrieval as explicit tools.
- Python servers can implement MCP-like behavior using clear tool schemas, handlers, and a simple JSON-RPC or HTTP transport.
- Good tool design emphasizes clarity, safety, privacy, and separation of concerns between logic and orchestration.
- Logging and evaluating tool usage is essential to keep MCP-powered systems reliable and cost effective at scale.
Related Articles
Building a Multi-Agent AI System
Learn how to design, coordinate, and deploy robust multi-agent AI systems, from architecture and tools to failure modes and production concerns.
10 min read · advancedGetting StartedBuilding a RAG Chatbot from Scratch with Python
Learn how to build a Retrieval-Augmented Generation (RAG) chatbot from scratch in Python, from data loading to retrieval and LLM integration.
10 min read · beginnerAI AgentsFinAgent and Beyond: Multimodal Foundation Agents for Trading
Deep dive into FinAgent's multimodal architecture for financial trading, covering dual-level reflection, memory retrieval, and agent swarm extensions
8 min read · advanced