The Tool Bloat Tipping Point

Written by Professor Synapse | Jan 20, 2026 4:00:01 PM

Series: Bounded Context Packs (Part 1 of 4)

"Every master of AI integration discovers this truth: the difference between a tool collection and a tool catastrophe lies not in what you can do, but in how thoughtfully you organize what you offer."
— Professor Synapse

The Moment I Knew

I was building an Obsidian integration. The kind where an AI assistant could read your notes, search your vault, manage files, track workspaces, and remember context across sessions. MCP (Model Context Protocol) had just launched, and the possibilities felt limitless.

So I built tools. Lots of them.

Content management needed read, write, and update operations. Storage management needed list, move, copy, archive, and open. Search needed content search, directory search, and memory search. The memory system required workspace management, state snapshots, session tracking.

Six agents. Thirty-three tools. Each tool with a schema defining its parameters, types, descriptions, and requirements.

Then I looked at what was actually being sent to the model at the start of every conversation: seven thousand tokens of tool definitions. Before a single user message. Before any actual work.

The model was spending more context understanding what it could do than actually doing anything.

And that's when the question hit me: What if the model only loaded what it needed?

The Mathematics of Chaos

My situation wasn't unique. The tool explosion is mathematical, and it hits every serious MCP integration.

Consider what a comprehensive platform integration actually requires:

HubSpot CRM: Companies (6 operations) + Contacts (6) + Deals (6) + Notes (8) + Associations (11) + Blog Posts (6) = 43 tools, barely scratching the surface.

Salesforce: Leads, Accounts, Opportunities, Cases, Custom Objects, Workflows, Reports. Easily 60+ tools for basic coverage.

Slack: Channels, Users, Messages, Apps, Workflows, Administration. Another 40+ tool integration.

The breaking point isn't the raw numbers. It's cognitive load. AI assistants make decisions based on available options, and research consistently shows that decision quality degrades as options multiply. Give a model 50 similar-sounding tools and watch it hesitate, second-guess, and pick the wrong one entirely.

"Should I use contacts_create or crm_contacts_add? What about contacts_upsert?"

Every hesitation burns tokens. Every wrong choice requires correction. Every correction burns more tokens.

Now consider local models. Cloud providers keep expanding context windows: ranging anywhere from 128K to 1Mill+. But local models running on consumer hardware? They struggle with 8-32K context. Memory is the bottleneck, and it's not improving as fast as model quality.

If your architecture assumes abundant context, you've locked out local inference entirely. The tool bloat problem determines whether local-first AI is even possible for a personal agent.

The Insight: Domain Boundaries

The conceptual unlock came from Eric Evans' Domain-Driven Design, specifically the idea of bounded contexts. Complex systems become manageable when organized around business domains rather than technical structures.

Applied to MCP tools, this meant recognizing that content operations ≠ storage operations ≠ search operations. They have different lifecycles, different concerns, different workflows:

Content operations focus on what's inside files: reading, writing, updating text.
Storage operations focus on where files are: moving, copying, organizing structure.
Search operations focus on finding things: queries across content, directories, memory.
Memory operations focus on remembering: sessions, workspaces, state snapshots.

Synaptic Labs AI education attribution required

Users already think in these boundaries intuitively. "I need to read a file" lives in a different mental space than "I need to find files matching a pattern."

What if tools were organized the same way? What if, instead of presenting 33 tools upfront, you presented domains? And what if the model could load only the domain it needed for the current task?

The architecture crystallized: a meta-layer for discovery, domain-specific tool collections, and on-demand loading. Start with two tools that reveal available domains. Load specific tools on demand. Keep context lean.

I had the pattern. I just couldn't prove it worked.

The Gap: What MCP Clients Don't Do

Here's where theory collided with reality.

MCP as a protocol supports dynamic tool registration. A server can add or remove tools, and clients can be notified of changes. The architecture I envisioned was technically possible.

But Claude Desktop, the primary MCP client most developers were using, didn't support it. Tools registered at startup stayed registered. You couldn't progressively disclose capabilities. You couldn't load domains on demand. It was all or nothing.

I had an architecture in my head that I couldn't implement in the tool everyone was using.

So I sat with it. Kept building. Kept running into the same walls.

Validation: Anthropic Sees It Too

Then, in November 2025, Anthropic published "Code execution with MCP: Building more efficient agents."

The core argument: tool definitions overload context windows. Intermediate results consume additional tokens. The solution? Progressive disclosure. Present tools as a filesystem. Let the model explore and load only what it needs.

They showed examples of 98.7% token reduction. They introduced the concept of "skills" as reusable capability bundles. They validated, with benchmarks and production experience, exactly the pattern I'd been stuck on for months.

Around the same time, Claude Skills launched in the product. Bounded packs of capabilities that load contextually. The ecosystem was implementing what I'd been trying to build.

I wasn't wrong about the architecture. I just hadn't figured out how to build it yet.

The Promise

The pattern that emerged has two entry points:

getTools: A discovery tool that reveals available agents and their tools
useTools: An execution tool that runs tools with unified context

Instead of 33 tools consuming 8K tokens at startup, you start with 2. The model discovers what's available, requests what it needs, and works with a focused toolset that matches the actual task.

The constraint that drove this design (limited context windows) turned into its greatest strength. An architecture that works at 8K context works everywhere. Cloud models benefit from cost savings. Local models become viable. The same pattern scales in both directions.

What's Coming

This article established the problem and the insight. The rest of this series goes deeper:

Article 2: The Meta-Tool Pattern explores how progressive disclosure actually works, with the conceptual architecture and Anthropic's validation
Article 3: From Theory to Production opens the hood on a real implementation, with code examples from an open-source system running in production
Article 4: Patterns They Didn't Cover shares what months of production use revealed: batch operations, session context, cross-domain routing, and the patterns that only emerge under real load

I knew the solution. I just couldn't build it yet.

So I learned.

Frequently Asked Questions

How many MCP tools is too many?

There's no hard limit, but problems typically start around 20-30 tools. By 40+ tools, you'll notice degraded response quality, slower processing, and the model struggling to select the right tool. The issue isn't the number itself; it's the token overhead. Each tool schema consumes 200-400 tokens. At 50 tools, you're spending 10,000-20,000 tokens before the conversation even starts.

Why is Claude Desktop slow with multiple MCP servers?

Each MCP server contributes its tool schemas to Claude's context window. If you're running HubSpot (43 tools), Slack (40 tools), and a custom integration (20 tools), that's 100+ tool schemas loaded at startup. Claude must parse all of them for every request, even if you're only asking about the weather. The solution isn't fewer servers; it's smarter tool loading.

Can I dynamically load MCP tools at runtime?

Not in Claude Desktop. The MCP implementation loads all tool schemas at startup and doesn't support runtime registration changes. This is a platform limitation, not a protocol limitation. The workaround is the meta-tool pattern: register two tools (getTools and useTools) that can return and execute other tool schemas on demand.

What's the difference between MCP tool bloat and context window limits?

Context window limits are about total conversation length. Tool bloat is about baseline overhead: the tokens consumed before any conversation happens. A 128K context window sounds huge until 7K is permanently occupied by tool schemas you're not using. Tool bloat reduces your effective context for actual work.

How do I organize MCP tools for better performance?

Group tools by domain (bounded contexts), not by technical function. All content operations together. All search operations together. All memory operations together. Then implement progressive loading: start with discovery tools, load specific domains only when needed. This mirrors how Anthropic's own Claude Skills system works.

Does MCP tool overhead affect API costs?

Yes. If you're using Claude's API, you pay per token. Tool schemas count as input tokens. Running 50 tools at 300 tokens each means paying for 15,000 tokens on every single request, regardless of whether those tools get used. The meta-tool pattern can reduce this overhead by 90%+.

Why do MCP integrations get slower over time?

Scope creep. You start with 5 tools, add features, and suddenly you have 30. Each addition seems harmless, but the cumulative effect degrades performance. The fix isn't removing features; it's restructuring how tools load. Domain-based organization with on-demand loading lets you scale capabilities without scaling overhead.

Is there a best practice for MCP server architecture?

Yes: bounded contexts with progressive disclosure. Two meta-tools for discovery and execution, domain-organized agents for capabilities, on-demand loading for specific operations. This pattern scales from 10 tools to 100+ without proportional performance degradation. Anthropic validated this approach in their November 2025 engineering blog.

Next: Explore AI with Synaptic Labs

View full post