Server-Side Tool Gating

Giving MCP servers a voice in tool selection

2026-03-15 by Divan Visagie d0d04aa

MCP servers are powerful tools but they have no power over how the model decides to use them. They register their tools, describe them in a schema, and then sit there while the model guesses which ones to use. Every toolwhether relevant or not, gets loaded into the prompt, burning tokens and increasing the chance the model picks incorrectly.

If the user asks "what does my project note about feeding rats to pythons say?" and you have 6 tools registered, the model sees project_archive, notes_write, notes_edit alongside the droids that we are actually lookning for. Those are noise, more noise than those people on LinkedIn with cyberpunk stylised AI generated profile pictures. At 30 or more tools across multiple servers that's when things get hairy. Research shows that LLM tool selection accuracy can drop significantly beyond ~20 tools, with naive all-tools-loaded baselines achieving as low as ~14% accuracy, and it only gets worse the more MCP servers you add. A certain weather MCP I know has so many tools it adds 20 000 tokens!

By adding a single tool with a defined name to one of my MCP servers and a modification to the logic applied before tool calls on the client I was able to cut 4 tools from every read-only related request resulting in ~318 tokens saved per turn, and for slash commands we can even avoid calling the model entirely.

λ1 │ chell │ chell
divan │ 14:32
[mcp] 4 server(s), 33 tool(s) ready
◯──◯ chell v0.2.0conversational engine ──◯────◯ ░░░░░░░ ▗░░░░░░░▖ ▗▓▓▓▓▓▓▓▓▓▖ ▗▓▓▓▓▓▓▓▓▓▓▓▖ ▀▀▀▀▀▀▀▀▀▀▀
Loading conversation from: /tmp/test.cmf
(Using Ollama backend at http://localhost:11434 with model: qwen3.5:4b)
When is the next train from T-Centralen to Ropsten ?
[debug] gating check: mcp_manager=true, has_gating=true
[debug] gating: querying 1 server(s): pman-mcp
[debug] gating: 4 verdict(s) from 1 server(s)
[debug] gating: excluded 4 tool(s) (~318 tokens saved)
{project_new, notes_edit, notes_write, project_archive}
[debug] tool loop starting with 31 tool(s)
0 │ chell │ chell --debug

Its a humble saving in my test context, but if adopted across tooling it could save thousands of tokens.

The ecosystem has noticed the tool selection problem (probably at the moment we started loading 20 000 tokens for weather). OpenAI's Agents SDK has tool_filter. Google ADK has its own. Portkey built an embedding-based filter. The STRAP pattern consolidates tools into fewer "megatools." But all of these place filtering intelligence outside the server. The server, which owns the tools and knows their capabilities best, still has no say in the matter.

Server-Side Tool Gating

The fix is simple. Expose a well-known tool called _tool_gating. Its presence in tools/list signals that the server supports gating. No capability flag needed, no spec changes required.

An aware client detects this tool at connection time, then calls it on every request before building the tool list for the model. The server evaluates the request and returns verdicts:

Include is the default. The response only contains exceptions.

Exclude flow

User Client MCP Server Model "What does my project note say?" tools/call _tool_gating exclude: notes_write, notes_edit, project_archive message + 3 tools (notes_read, project_list, project_new) tool_use: notes_read(...) tools/call notes_read result reply

Three tools removed from the model's context. The model sees a cleaner, smaller tool list and picks correctly on the first try.

Claim flow

User Client MCP Server Model Model never called "/projects" tools/call _tool_gating claim: project_list, args: {} tools/call project_list project listing project listing

The model is never called. A deterministic command resolves instantly. Slash commands that should resolve in milliseconds shouldn't wait for a round-trip through an LLM.

The Implementation

When I initially came up with this concept I was thinking of using the MCP resource primative not a tool, but I decided against it when I started implementing. resources/read only accepts a URI, and passing arbitrary user messages through a URL felt too fragile to be worth it. A tool accepts structured input naturally. Additionally this allows clients that have not been adapted for use with this filter to have the option to find it and maybe use it, this remains untested but is a nice theory anyway.

I decided to test this idea of mine by making changes in two existing projects: a Python MCP server (pman-mcp, using FastMCP) and a Rust MCP client (chell) which is basically my own little agent client.

Indiana Jones in the Well of Souls Python. Why did it have to be Python?

Server side

The gating logic lives in a single file, gating.py. Simple keyword matching, not ML:

def evaluate_gating(message: str, content_type: str) -> list[Verdict]:
    verdicts = []
    lower = message.lower().strip()

    # Claims: deterministic commands
    if lower in ("/projects", "/list"):
        return [Verdict("project_list", "claim", arguments={})]

    if lower.startswith("/new "):
        name = message[5:].strip()
        return [Verdict("project_new", "claim", arguments={"name": name})]

    # Excludes: read-only intent
    read_only = is_read_only_intent(lower)
    if read_only:
        verdicts.append(Verdict("notes_write", "exclude"))
        verdicts.append(Verdict("notes_edit", "exclude"))
        verdicts.append(Verdict("project_new", "exclude"))

    # Archive only relevant with archive intent
    if not has_archive_intent(lower):
        verdicts.append(Verdict("project_archive", "exclude"))

    return verdicts

The claim patterns are intentionally narrow: exact matches or clear prefixes. The exclude logic errs on the side of inclusion. When in doubt, include.

Client side

Chell detects gating at connection time by scanning tools/list for _tool_gating. If found, it sets a flag on the server handle and removes the tool from the public list (the LLM never sees it).

On each turn, before sending to the model:

  1. Call _tool_gating on all gating-capable servers
  2. If any verdict is claim, call the tool directly, return the result, skip the model
  3. If any verdicts are exclude, filter those tools from the list for this turn
  4. Send the filtered list to the model

Claim failures fall through to the normal LLM path. If the server's claim logic is wrong, the model still gets a chance.

Where the Idea Came From

Before MCP, I built a plugin system for Telegram bots where each capability had a check() method. Before routing a request, the bot would ask every registered capability: "can you handle this?" Each one would score its own suitability. Some would say "not relevant, skip me." Others would claim the request outright: a pattern-matched command, zero ambiguity, no need to involve the model at all.

I formalised this as the Layer-Capability Pattern. The core idea was that

The tool knows itself better than the model knows the tool

A PII scanner knows it's irrelevant when there are no privacy keywords. A regex matcher knows it should claim a request starting with /regex. The model can only guess from a text description.

xkcd 927: Standards
xkcd 927, "Standards." Source.

MCP formalised almost everything I'd hacked together in those early bots: structured tool definitions, transport abstraction, discovery. But tool selection remained entirely model-controlled. The server registers tools, the model reads descriptions, the model decides.

_tool_gating closes that gap without creating a competing standard. It uses the primitives MCP already provides. No spec changes. No capability flags. One well-known tool and a small client-side change. If adopted across more servers and clients, that's millions of tokens saved. Especially ones about the weather.