3.1.0
Standard MCP tool usage has two scaling problems. First, every tool in the catalog is loaded into the LLM’s context upfront — with hundreds of tools, that’s tens of thousands of tokens spent before the LLM even reads the user’s request. Second, every tool call is a round-trip: the LLM calls a tool, the result passes back through the context window, the LLM reasons about it, calls another tool, and so on. Intermediate results that only exist to feed the next step still burn tokens flowing through the model.
CodeMode solves both problems. Instead of seeing your entire tool catalog, the LLM gets meta-tools for discovering what’s available and for writing and executing code that calls the tools it needs. It discovers on demand, writes a script that chains tool calls in a sandbox, and gets back only the final answer.
The approach was introduced by Cloudflare in Code Mode and explored further by Anthropic in Code Execution with MCP.
Getting Started
You take a normal server with normally registered tools and add aCodeMode transform. The transform wraps your existing tools in the code mode machinery — your tool functions don’t change at all:
add and multiply directly. Instead, they see the meta-tools that CodeMode provides — tools for discovering what’s available and executing code against it. The original tools are still there, but they’re accessed through the CodeMode layer.
Discovery
Before the LLM can write code that calls your tools, it needs to know what tools exist and how to call them. This is the discovery process — the LLM uses meta-tools to learn about your tool catalog, then writes code against what it finds. The fundamental tradeoff is tokens vs. round-trips. Each discovery step is an LLM round-trip: the model calls a tool, waits for the response, reasons about it, then decides what to do next. More steps mean less wasted context (each step is targeted) but more latency and API calls. Fewer steps mean the LLM gets information upfront but pays for detail it might not need. By default, CodeMode gives the LLM three tools —search, get_schema, and execute — creating a three-stage discovery flow:
Search for tools
First, the LLM uses the This lets the LLM know which tools are available and what they do, significantly reducing the surface area it needs to consider.
search meta-tool to find tools by keyword.For example, it might do search(query="math numbers") and receive the following response:Get parameter details for the tools
Next, the LLM calls Now the LLM knows the parameters for the tools it found, and can write code that chains the tool calls. If it needed more detail, it could have called
get_schema to get parameter details for the tools it found in the previous step.For example, it might do get_schema(tools=["add", "multiply"]) and receive the following response:get_schema with detail="full" to get the complete JSON schema.Write and execute code that chains the tool calls
Finally, the LLM writes and executes code that chains the tool calls in a Python sandbox. Inside the sandbox, The result is returned to the LLM.
call_tool(name, params) is the only function available. The LLM uses this to compose tools into a workflow and return a final result.For example, it might write the following code and call the execute tool with it:Discovery Tools
CodeMode ships with four built-in discovery tools:Search, GetSchemas, GetTags, and ListTools. By default, only Search and GetSchemas are enabled. Each tool supports a default_detail parameter that sets the default verbosity level, and the LLM can override the detail level on any individual call.
Detail Levels
Search and GetSchemas share the same three detail levels, so the same detail value produces the same output format regardless of which tool the LLM calls:
| Level | Output | Token cost |
|---|---|---|
"brief" | Tool names and one-line descriptions | Cheapest — good for scanning |
"detailed" | Compact markdown with parameter names, types, and required markers | Medium — often enough to write code |
"full" | Complete JSON schema | Most expensive — everything |
Search defaults to "brief" and GetSchemas defaults to "detailed".
Search
Search finds tools by natural-language query using BM25 ranking. At its default "brief" detail, results include just tool names and descriptions — enough to decide which tools are worth inspecting further. The LLM can request "detailed" to get parameter schemas inline, or "full" for the complete JSON.
Search results include an annotation like "2 of 10 tools:" when the result set is smaller than the full catalog, so the LLM knows there are more tools to discover with different queries.
You can cap result count with default_limit. The LLM can also override the limit per call. This is useful for large catalogs where you want to keep search results focused:
tags parameter so the LLM can narrow results to specific categories before searching.
GetSchemas
GetSchemas returns parameter details for specific tools by name. At its default "detailed" level, it renders compact markdown with parameter names, types, and required markers. At "full", it returns the complete JSON schema — useful when tools have deeply nested parameters that the compact format doesn’t capture.
GetTags
GetTags lets the LLM browse tools by category using tag metadata. At brief detail, the LLM sees tag names with counts. At full detail, it sees tools listed under each tag:
GetTags isn’t included in the defaults — add it when browsing by category would help the LLM orient itself in a large catalog. The LLM can browse tags first, then pass specific tags into Search to narrow results.
ListTools
ListTools dumps the entire catalog at whatever detail level the LLM requests. It supports the same three detail levels as Search and GetSchemas, defaulting to "brief".
ListTools isn’t included in the defaults — for large catalogs, search-based discovery is more token-efficient. But for smaller catalogs (under ~20 tools), letting the LLM see everything upfront can be faster than multiple search round-trips:
Discovery Patterns
The right discovery configuration depends on your server — how many tools you have and how complex their parameters are. It may be tempting to minimize round-trips by collapsing everything into fewer steps, but for the complex servers that benefit most from CodeMode, our experience is that staged discovery leads to better results. Flooding the LLM with detailed schemas for tools it doesn’t end up using can hurt more than the extra round-trip costs. Each pattern below is a complete, copyable configuration.Three-Stage
The default. The LLM searches for candidates, inspects schemas for the ones it wants, then writes code. Best for large or complex tool sets where you want to minimize context usage — the LLM only pays for schemas it actually needs.GetTags so the LLM can browse by category before searching — giving it four stages of progressive disclosure:
Two-Stage
Search returns parameter schemas inline, so the LLM can go straight from search to execute. Best for smaller catalogs where the extra tokens per search result are a reasonable price for one fewer round-trip.GetSchemas is still available as a fallback — the LLM can call it with detail="full" if it encounters a tool with complex nested parameters where the compact markdown isn’t enough.
Single-Stage
Skip discovery entirely and bake tool instructions into the execute tool’s description. Best for very simple servers where the LLM already knows what tools are available — maybe there are only a few, or they’re described in the system prompt.Custom Discovery Tools
Discovery tools are composable — you can mix the built-ins with your own. Each discovery tool is a callable that receives catalog access and returns aTool. The catalog accessor is a function (not the catalog itself) because the catalog is request-scoped — different users may see different tools based on auth.
Here’s a minimal example:
Sandbox Configuration
Resource Limits
The defaultMontySandboxProvider can enforce execution limits — timeouts, memory caps, recursion depth, and more. Without limits, LLM-generated scripts can run indefinitely.
| Key | Type | Description |
|---|---|---|
max_duration_secs | float | Maximum wall-clock execution time |
max_memory | int | Memory ceiling in bytes |
max_allocations | int | Cap on total object allocations |
max_recursion_depth | int | Maximum recursion depth |
gc_interval | int | Garbage collection frequency |
Custom Sandbox Providers
You can replace the default sandbox with any object implementing theSandboxProvider protocol:
external_functions dict contains async callables injected into the sandbox scope — execute uses this to provide call_tool.
