MCP When Your Agent Already Has Hands?

March 10, 2026

Tire Chains in Summer

A colleague asked which suite of MCP tools was safe to install. They were working in Claude Code, so I asked why they were installing MCP servers at all. They had a bash shell. They had gh, curl, jq, direct API access to everything they wanted to connect.

They didn't have a security problem. They had a tire chains in summer problem.

What MCP Actually Does

Strip away the branding, and Model Context Protocol is a specific mechanism: the LLM watches its own context for a structured tool-call pattern, the harness intercepts that pattern, executes the call, and injects the result back into the context window. That's it. The model generates a JSON blob that says "call this function with these arguments," the runtime catches it before it reaches you, does the thing, and feeds the output back in.

This is useful. In a web chat interface where the model has no other way to reach external systems, it's the only way to give it hands. Claude.ai connecting to your Google Calendar? That's MCP solving a real problem. The model is sandboxed. It has no shell, no filesystem, no outbound network access. The protocol bridges that gap.

But here's what happens when you set up a GitHub MCP server inside Claude Code: the model generates a structured tool call, the harness intercepts it, the MCP server translates it into a GitHub API request, gets the response, serializes it back, and injects it into the context. You've added two layers of abstraction and spent tokens on the structured call, the response formatting, and the model's interpretation of the result.

Or you could just run gh pr list.

The Tire Chains Problem

Tire chains are a brilliant piece of engineering. They solve an important problem. And if you put them on your car in summer, you'll chew up the road, burn extra fuel, and drive slower than everyone around you--all while solving a problem that doesn't exist.

MCP in an agentic coding environment is tire chains in summer. You already have traction. The bash shell is the traction. Your LLM can execute commands, call APIs directly, pipe output, read files. The agent already has hands. The harness--the thing running the model--has full access to the tools. Adding MCP means asking the model to generate a structured intermediary call to a server that then makes the same API request the shell could have made directly.

It's a wrapper around a wrapper. And each layer costs tokens and cedes control.

The Quiet Tax

Let's make this concrete. I started drafting this post in Claude's web chat--an environment where MCP is the right design choice, because the model has no shell, no filesystem, no outbound network access. The tool infrastructure in that conversation's system prompt ran roughly 17,000 tokens. Google Drive search and fetch, web search, image search, sports scores, weather, recipe widgets, places lookup, memory tools, citation instructions, and pages of behavioral guidance for each one. The actual content about me--my project files, my preferences, my conversation history--accounted for about 3,000 tokens. Fourteen percent of the system prompt was about the work. Eighty-six percent was tool-call plumbing.

Side note: I asked Claude web to show me those tool definitions so I could include them here. It refused. In a previous post I wrote about system prompts--"Pay No Mind to that Guardrail"--it was happy to walk through its own context. This time, the guardrails were tighter. So I moved the conversation to Claude Code, where the model can see and report on its own infrastructure. The guardrails are a post for another day. The token counts aren't.

Here's what Claude Code's context looks like--the environment I'm writing in now:

Layer	~Tokens	What it covers
21 built-in tool schemas (Bash, Read, Write, Edit, Grep, Glob, Agent, WebFetch, etc.)	10,000–12,000	Everything the agent can do
Behavioral instructions (tone, security, git safety, planning)	3,000–4,000	How to behave
Project files (CLAUDE.md, memory, style guide)	3,500	My project, my preferences
Session state (skills list, git status, date)	500	Current context
Total	~18,000

About 15% of that is about me and my work. The rest is plumbing. Both environments--web and Code--carry roughly the same overhead. The difference is what that overhead is for. In the web chat, MCP is the only path to external systems. The 17,000 tokens of tooling is the cost of giving a sandboxed model hands. In Claude Code, those 18,000 tokens already include a Bash tool that can reach everything MCP would connect to. The plumbing is already laid.

Now here's what happens if I install GitHub's official MCP server. It registers eighty tools. Here are two of them, as the model sees them:

{
  "name": "list_issues",
  "description": "List issues in a GitHub repository.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "owner":     { "type": "string", "description": "Repository owner" },
      "repo":      { "type": "string", "description": "Repository name" },
      "state":     { "type": "string", "enum": ["OPEN", "CLOSED"] },
      "labels":    { "type": "array", "items": { "type": "string" } },
      "orderBy":   { "type": "string", "enum": ["CREATED_AT", "UPDATED_AT", "COMMENTS"] },
      "direction": { "type": "string", "enum": ["ASC", "DESC"] },
      "since":     { "type": "string", "description": "ISO 8601 timestamp" },
      "perPage":   { "type": "number" },
      "after":     { "type": "string", "description": "Cursor for pagination" }
    },
    "required": ["owner", "repo"]
  }
}

{
  "name": "list_pull_requests",
  "description": "List pull requests in a GitHub repository.
    If the user specifies an author, DO NOT use this tool
    and use the search_pull_requests tool instead.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "owner":     { "type": "string", "description": "Repository owner" },
      "repo":      { "type": "string", "description": "Repository name" },
      "state":     { "type": "string", "enum": ["open", "closed", "all"] },
      "head":      { "type": "string", "description": "Filter by head user/org and branch" },
      "base":      { "type": "string", "description": "Filter by base branch" },
      "sort":      { "type": "string", "enum": ["created", "updated", "popularity", "long-running"] },
      "direction": { "type": "string", "enum": ["asc", "desc"] },
      "perPage":   { "type": "number" },
      "page":      { "type": "number" }
    },
    "required": ["owner", "repo"]
  }
}

That's two of eighty. Each one sits in the context window for the entire conversation whether it gets called or not. Notice the description for list_pull_requests--it's carrying behavioral instructions inside the tool schema: "DO NOT use this tool and use the search_pull_requests tool instead." That's prompt engineering baked into the plumbing. Every tool is a little system prompt of its own, and you're paying for all eighty of them on every message.

Here's the same information via CLI (preview of my ideas queue at no extra charge):

$ gh issue list --limit 5

26  OPEN  Tire Chains in Summer                  2026-03-07
14  OPEN  AI Slop vs notSlop                     2026-01-28
11  OPEN  Hallucinations v2 -- Longer Context... 2026-01-28
10  OPEN  Czarina Memory                         2026-01-28
 9  OPEN  Sark v2.0                              2026-01-28

$ gh pr list

(no open pull requests)

Two commands. A few dozen tokens of output. No schema overhead. No tool-call round-trip. The model already has a Bash tool--one schema that covers gh, curl, jq, git, docker, and every other CLI on the system. Adding eighty more schemas for a subset of what one of those CLIs already does isn't adding capability. It's adding a second, more expensive path to the same destination.

Four Layers, Four Jobs

There are four ways to give an LLM context, and they scale from free to expensive. Start at the top. Move down only when you need to.

Project instructions: zero tool overhead. Claude Code reads a CLAUDE.md file at the root of your project. Cursor has .cursorrules. Most agentic tools have something similar. Put your API conventions, version requirements, or documentation links there and the model has them on every message--no tool call, no external dependency, no schema. Need the model to use the right version of a library? A line in your project file costs less than any tool and applies to everything. This is the layer most people skip, and it's the one that solves most of the problems they install MCP to fix.

Skills: load on demand. Claude Code has custom skills--/publish, /sync-substack, /update-project--that are listed in my context as one-line triggers (~100 tokens total). When I invoke one, it expands into a full prompt with detailed instructions. When I don't, it's nearly invisible. That's load-on-demand: minimal always-on cost, full capability when needed. It's a design pattern that MCP could learn from--not every tool needs to be in context all the time.

Direct tools: one schema, unlimited reach. Claude Code's Bash tool is about 2,000 tokens of schema and instructions. That single definition gives the model access to every CLI, every API, every file operation on the system. The tradeoff is that the model needs to know how to use each tool--it has to construct the right gh flags, the right curl headers, the right jq filter. For a coding agent working with a developer who can read the output, that's fine. The model is good at shell commands, and when it isn't, you can see exactly what it tried.

MCP: structured discovery for sandboxed environments. When the model has no shell--a web chat, a mobile app, an enterprise copilot running in a locked-down container--MCP is the right design. It gives the model a menu of capabilities with typed inputs and predictable outputs. A support agent that needs to check Zendesk, pull a customer's Stripe history, and file a Jira ticket in the same turn--that's three auth models and three APIs. MCP earns its overhead there because there's no shorter path.

The Upstream Fix

The MCP server my colleague was actually asking about was Context7--a lightweight server (just two tools) that fetches current library documentation from a centralized index. The pitch: LLMs hallucinate APIs. They suggest functions that were renamed, deprecated, or never existed, because training data lags behind releases. Context7 fixes this by pulling version-specific docs at query time.

The hallucination problem is real. I've watched Claude confidently suggest method signatures that haven't existed for three major versions. But the fix Context7 is selling is, at its core, "what if the model could read the documentation?" In Claude Code, the model can already read the documentation. WebSearch finds current docs in one call. WebFetch reads them. Bash can run npm docs, check installed versions, or just execute the code and see what fails. The feedback loop is already there.

When the model hallucinates an API, it's usually because nobody told it which version they're using, or pointed it at the migration guide, or decomposed the problem enough for the model to know what it doesn't know. That's a specification problem, not a lookup problem. You can fix it with a line in your CLAUDE.md--"always check current documentation before using library APIs"--and it applies to every library, costs zero tokens of tool overhead, and requires no external dependency. Or you can just say it: "check the docs for this." That's a sentence, not an MCP server.

The instinct to install a tool is understandable. Developers solve problems by adding tools. But when the problem is that you didn't tell the model what you needed, the fix is learning to specify, not learning to install.

The counterargument is that standardization matters more than efficiency--that enterprise environments need a common protocol, and adopting MCP now pays off as tool ecosystems grow. That's fair. But for a single-user coding agent with shell access, the tradeoff favors simplicity and direct invocation. And where enterprises reach for MCP to standardize tool discovery across teams, project-level agent instructions achieve the same consistency at a fraction of the cost--no protocol layer, no token overhead, just a file in the repo that every agent reads.

Claude, Agent Claude

On the agentic side, MCP shifts control of tooling away from the agent runtime and into the protocol layer. With direct tool calls, the agent owns the tool lifecycle--discovery, invocation, error handling, all of it. With MCP, the protocol owns tool discovery. That's a deliberate architectural choice, and in enterprise environments with shared tooling it's often the right one. But for an autonomous agent operating in an environment it already controls, you're asking your agent to fill out forms in triplicate before sipping his martini.

You gave the agent a shell. You gave it access to your filesystem, your APIs, your CLIs. That's the license to kill.

The question isn't whether MCP is good--it's whether it's the right tool for the job at hand.

James Henry is a senior engineer who checked the overhead, saw the shell was right there, and ran gh pr list. He works with LLMs liberally — including in the writing of this post — because when the agent already has hands, the collaboration is the point, not the protocol.