Programmatic Tools
What this page covers
- Why we offer two capabilities: programmatic Codemode and reusable Skills
- How sandbox, codemode, and skills relate to each other
- How the toggles behave in the UI/backend
- When to use each mode, and when to combine them
- Configuration levers (
allow_direct_tool_calls, reranker hook) and their tradeoffs
Three distinct concepts
Sandbox, Codemode, and Skills are separate concepts with a clear dependency hierarchy.
Sandbox
A sandbox is an isolated code-execution environment. It runs independently of Codemode and Skills and can be used on its own for:
- Pre-hooks — running setup code before an agent starts (e.g. injecting variables)
- WebSocket status monitoring — the
/configure/sandbox/wsendpoint reports sandbox availability - Standalone code runs — any toolset or hook that needs to execute Python
Two variants are supported:
eval— in-process Python execution (fast, default)jupyter— dedicated Jupyter kernel per agent (isolated, full kernel state)
Set sandbox_variant in an agent spec to choose the variant. A sandbox is created eagerly at agent-creation time when sandbox_variant is specified, regardless of whether Codemode or Skills are enabled.
Codemode
Codemode lets the agent write Python to compose MCP tools programmatically (control flow, parallelism, error handling). Internally it builds tool bindings from the MCP registry and runs generated code via execute_code.
Codemode requires a sandbox. Without a sandbox there is nowhere to execute code. If no sandbox_variant is specified but Codemode is enabled, an eval sandbox is created automatically.
Skills
Skills are reusable workflows packaged with instructions, resources, and executable scripts. The AgentSkillsToolset exposes list_skills, load_skill, read_skill_resource, and run_skill_script.
Skills require a sandbox. Script execution (run_skill_script) runs inside a sandbox. If no sandbox_variant is specified but Skills are enabled, an eval sandbox is created automatically.
Dependency summary
Sandbox ───────────────────► exists independently
▲
│ requires
┌──────┴──────┐
Codemode Skills
| Scenario | Sandbox | Codemode | Skills |
|---|---|---|---|
| Pre-hook only | ✅ required | — | — |
| Status monitoring | ✅ required | — | — |
| Codemode only | ✅ auto-created (eval) | ✅ | — |
| Skills only | ✅ auto-created (eval) | — | ✅ |
| Codemode + Skills | ✅ shared | ✅ | ✅ |
| Sandbox only (jupyter) | ✅ explicit | — | — |
When both Codemode and Skills are enabled they share the same sandbox instance so variables and installed packages are visible across both.
Goals
- Codemode: let the agent write Python to compose tools directly, reducing LLM tool-call overhead and enabling control flow, parallelism, and error handling.
- Skills: package reusable workflows (instructions, resources, scripts) so an agent can load, read resources, and run scripts without writing new code each time.
- Coexistence: you can enable Skills, Codemode, or both. Codemode provides its own tool discovery/execution; Skills surface curated capabilities.
Toggles and behavior
- Enable Codemode: adds
CodemodeToolset(list/search tools, execute Python code). When Codemode is on, selected MCP servers are used to build Codemode’s registry (tools are exposed via Codemode meta-tools rather than direct MCP tools). - Enable Skills: adds
AgentSkillsToolset(skills discovery, load, resource read, script run). Uses the configuredskills/directory. - Both on: Skills are automatically wired into Codemode via
wire_skills_into_codemode(). Skill operations become available as generated bindings (from generated.skills import ...) insideexecute_code, alongside MCP tool bindings. TheAgentSkillsToolsetis still attached for direct tool access, but the primary interaction path is through Codemode's code execution.
Codemode toolset capabilities
list_tool_names(fast names-only, supportsserver,keywords,limit)search_tools(returns tool definitions; can be reranked, includes deferred tools by default)get_tool_details(full schema, output shape, and examples for one tool)list_servers(connected MCP servers)execute_code(Python in sandbox; import bindings fromgenerated.mcp.<server>)call_tool(single tool call) — optional, see below
Direct tool calls: allow_direct_tool_calls
- When false (default in agent-runtimes codemode setup):
call_toolis hidden; all execution goes throughexecute_code.- Instructions emphasize code-first composition, mirroring the “write code to use tools” pattern.
- When true (opt-in):
call_toolis exposed as a convenience for single calls.- Useful for simple queries where code would be overkill.
- Choose false for stricter discipline and clearer audit of composed workflows; choose true for convenience at the cost of less enforced structure.
Search quality: reranker hook
search_toolsaccepts an optional async reranker:tool_reranker(tools, query, server) -> list[ToolDefinition].- Use it to reorder results (e.g., LLM-based relevance, business priority). Failures fall back to registry order.
- This keeps discovery flexible without tying Codemode to any specific model provider.
Execution model
- Codemode runs Python in an isolated sandbox (code-sandboxes). Agents import tool bindings from
generated.mcp.<server_name>. - Use
async/await, loops, conditionals, andasyncio.gatherfor parallelism. - Errors are caught and surfaced in the
execute_coderesult payload, alongsidestdout/stderr.
Skills model
- Skills live in a
skills/directory withSKILL.md, resources, and scripts. - Tooling provided by
AgentSkillsToolset:list_skills,load_skill,read_skill_resource,run_skill_script. - Skills can be authored as files or programmatic callables; they complement Codemode when you want curated, reusable behaviors.
UI & backend flow
- UI checkboxes toggle Skills/Codemode. If Codemode is enabled, MCP server selection is still available and is used to scope Codemode discovery.
- Backend route
/api/v1/agentsbuilds toolsets accordingly:sandbox_variantset → sandbox is created eagerly, independent of Codemode/Skills status.- Skills enabled →
evalsandbox auto-created (if nosandbox_variant);AgentSkillsToolsetadded. - Codemode enabled →
evalsandbox auto-created (if nosandbox_variant);CodemodeToolsetadded withallow_direct_tool_calls=Falseby default. - Both enabled → both toolsets share the same sandbox, then skills are wired into Codemode via
wire_skills_into_codemode().
Tool approvals mechanism
Tool approvals have two complementary paths:
- Per-tool policy approvals for MCP/Skills (server + tool allowlists reflected in WebSocket snapshots).
- Per-call deferred approvals for tools marked
requires_approval=True(runtime stop-the-world approval/continue).
Both are designed to keep WebSocket state as the source of truth while minimizing unnecessary round-trips.
1) Policy approvals (MCP/Skills)
- The UI tracks approved MCP/Skill tools via explicit allowlists (no implicit auto-approval).
- Backend approval state is synced through snapshots/events and reflected in store fields such as
approved_tools_by_server. - During execution, guardrails check these allowlists before requesting any new approval.
2) Deferred per-call approvals (DeferredToolRequests)
When a call requires runtime approval, pydantic-ai emits DeferredToolRequests with one or more ToolCallPart entries.
Agent-runtimes resolves these in two stages:
-
Inline fast path (second-pass behavior)
ToolsGuardrailCapability.handle_deferred_tool_calls()inspects local approval records first.- If a matching decision already exists (approved/rejected), it builds
DeferredToolResultsinline (usingbuild_results()when available). - Already-approved calls continue immediately in the same run; already-rejected calls map to
ToolDenied.
-
Stop-the-world WebSocket path (unresolved only)
- Any unresolved approvals bubble out as
DeferredToolRequests. - The adapter requests approval via
ToolApprovalManager, emits approval events, and waits for the user/reviewer decision. - On decision, the adapter resumes the run with
deferred_tool_resultsso only the unresolved subset is continued.
- Any unresolved approvals bubble out as
Event and state flow
- Approval records are created/updated in the local approval store.
- If AI-agents integration is enabled, decisions can be mirrored/relayed and then synced back.
- WebSocket snapshots/events update frontend state (
codemodeStatus/mcpStatus/fullContext) and drive the UI badges/toggles.
Why this hybrid model
- Fast for known decisions: avoids repeated stop/resume cycles for calls that are already approved.
- Safe for unknown decisions: unresolved calls still require explicit approval through the WebSocket workflow.
- Consistent UX: frontend always reflects authoritative snapshot state rather than optimistic local merges.
Skills-in-Codemode wiring
When both Skills and Codemode are enabled, agent-runtimes automatically wires skills into Codemode so that skill operations are available as generated bindings inside execute_code. This is handled by wire_skills_into_codemode() in agent_factory.py.
What the wiring does
-
Generates skill bindings — Calls
generate_skill_bindings()to creategenerated/mcp/skills/with wrapper functions forlist_skills,load_skill,read_skill_resource, andrun_skill. -
Sets a skill tool caller — Registers a callback on the executor that routes
skills__*tool calls to the realAgentSkillsToolset. -
Stores skills metadata — Calls
set_skills_metadata()on the executor so that remote sandboxes (Datalayer Runtime, Jupyter kernel) can regenerate skill bindings inline. -
Registers a skills proxy caller — For remote sandboxes, registers a proxy caller in
mcp_proxy.pyso that HTTP-routedskills__*calls reach theAgentSkillsToolset.
Post-init callback pattern
Since CodemodeToolset lazily initializes its executor, the wiring is deferred using a post-init callback:
# In rebuild_codemode (app.py / routes/agents.py)
if skills_enabled:
codemode_toolset.add_post_init_callback(
lambda ts: wire_skills_into_codemode(ts, skills_toolset)
)
The callback fires once when the executor is first created (e.g., on the first execute_code call), ensuring the executor exists before binding generation and caller registration happen.
Using skills in execute_code
Once wired, agents can call skill operations from within execute_code:
execute_code(code='''
from generated.skills import list_skills, load_skill, run_skill
from generated.mcp.filesystem import read_file
# Discover available skills
skills = await list_skills({})
print(f"Skills: {[s['name'] for s in skills]}")
# Load a skill's instructions
instructions = await load_skill({"skill_name": "pdf-extractor"})
print(instructions)
# Run a skill script
result = await run_skill({
"skill_name": "pdf-extractor",
"script_name": "extract.py",
"args": {"path": "/data/report.pdf"}
})
print(result)
''')
Remote sandbox support
When the code sandbox is remote (e.g., a Datalayer Runtime or Jupyter kernel), the skill bindings are generated inline inside the sandbox by _generate_tools_in_sandbox(). Tool calls from the remote sandbox make HTTP requests to agent-runtimes, where the skills proxy in mcp_proxy.py intercepts server_name == "skills" and routes the call to the registered AgentSkillsToolset.
Skills-first, Codemode glue (example)
Use Skills for curated, reusable steps and Codemode for orchestration:
# 1) Load skill instructions
load_skill("pdf-extractor")
# 2) Compose multi-step flow in code
execute_code(code='''
from generated.mcp.filesystem import read_file
from generated.skills import run_skill
doc = await read_file({"path": "/data/report.pdf"})
result = await run_skill({
"skill_name": "pdf-extractor",
"script_name": "extract.py",
"args": {"path": "/data/report.pdf"}
})
print(result)
''')
Choosing a pattern
- Use Skills when you want repeatable, reviewed workflows and clearer governance.
- Use Codemode when you need ad-hoc composition, control flow, and efficiency.
- Use both when you want curated capabilities (Skills) plus the ability to glue them together with code (Codemode). When both are enabled, skills are automatically wired into Codemode as generated bindings (
from generated.skills import ...), giving agents a unified import pattern for all tools.
Configuration summary
allow_direct_tool_calls: hide or exposecall_tool(defaults to false in agent-runtimes wiring).tool_reranker: optional async hook to reordersearch_toolsresults.keywords/limitonlist_tool_names: faster, filtered discovery.include_deferredonsearch_tools/list_tool_names: control discovery of tools markeddefer_loading.max_tool_calls: optional per-run safety cap on tool invocations insideexecute_code.
allow_direct_tool_calls
- What it does: Controls whether the
call_toolshortcut is exposed. When off, all tool usage flows throughexecute_code, which keeps execution auditable and code-first. - Default:
falsein agent-runtimes wiring (CodemodeToolset is instantiated with direct calls disabled). - When to set true: Simple, single-step calls where the overhead of writing code is unnecessary, or for rapid prototyping.
- When to keep false: Shared/production agents, where you want consistent logging, fewer accidental direct calls, and clearer control over how tools are composed.
tool_reranker
- What it does: Optional async hook
tool_reranker(tools, query, server) -> list[ToolDefinition]that reorderssearch_toolsresults. - Why use it: Apply model-based relevance, business rules, or safety filters to discovery without changing the registry itself.
- Behavior on error: If the hook raises or fails, Codemode falls back to registry order; search still returns results.
- Usage tip: Log applied ordering when the hook is enabled to keep discovery transparent.
Safety and clarity tips
- Prefer
allow_direct_tool_calls=Falsein shared/production agents to keep a single, auditable execution path viaexecute_code. - When enabling reranking, log or trace the applied order for transparency.
- Keep Skills concise and documented; use Codemode for bespoke multi-step tasks.
Recommended defaults (cookbook)
Production
enable_skills: trueenable_codemode: trueallow_direct_tool_calls: falseenable_tool_reranker: true (if you have a safe reranker configured)max_tool_calls: set a conservative limit (e.g., 50–200) to prevent runaway loops
Prototyping
enable_skills: optionalenable_codemode: trueallow_direct_tool_calls: trueenable_tool_reranker: optionalmax_tool_calls: unset or higher limit for exploration
MCP Servers
Agent Runtimes provides comprehensive support for MCP Servers, enabling agents to access external tools and data sources through a standardized interface.
Overview
MCP servers are external processes that provide tools, resources, and prompts to AI agents. Agent Runtimes supports two types of MCP server configurations:
MCP Config (from mcp.json)
MCP Config servers are user-defined servers configured in ~/.datalayer/mcp.json. These servers:
- Start automatically when the agent runtime starts
- Are fully customizable with your own commands, arguments, and environment variables
- Appear in the agent form where users can select which servers to include as toolsets
- Support any MCP-compatible server - if it follows the MCP specification, it will work
- Are stored separately from catalog servers, allowing the same ID in both without conflict
MCP Catalog (predefined servers)
MCP Catalog servers are predefined server configurations included with Agent Runtimes. These servers:
- Are NOT started automatically - users must explicitly enable them via API
- Can be enabled on-demand using the
/api/v1/mcp/servers/catalog/{server_name}/enableendpoint - Provide common tools like web search, file system access, etc.
- Have their own storage separate from config servers
For most users, MCP Config is recommended. Add your servers to ~/.datalayer/mcp.json and they'll be available automatically when the agent runtime starts.
The same server ID can exist in both config and catalog - they are tracked independently.
Key Features
- Automatic Server Lifecycle — Config servers start with the application and stop on shutdown
- Retry with Backoff — Transient failures trigger automatic retries with exponential backoff
- Sequential Startup — Multiple MCP servers start sequentially to avoid resource conflicts
- Status Monitoring — Real-time status of all MCP toolsets via API endpoint
- Separate Storage — Config and catalog servers are stored independently, allowing same IDs in both
- Runtime Updates — Dynamically add/remove MCP servers from running agents via PATCH API
MCP Server Examples
Agent Runtimes supports any MCP-compatible server—if it follows the MCP specification, it will work. The table below shows a few popular examples to get you started:
| Server | URL | Type | Description |
|---|---|---|---|
| Tavily | docs | Remote | Web search and content extraction |
| Filesystem | modelcontextprotocol/servers | Local | File system access |
| GitHub | github/github-mcp-server | Local | GitHub repository access |
| Google Workspace | taylorwilsdon/google_workspace_mcp | Local | Google Workspace (Gmail, Gdrive, etc.) access |
| Slack | datalayer/slack-mcp-server | Local | Slack workspace access |
| Kaggle | docs | Remote | Kaggle datasets, models, competitions, notebooks |
| AlphaVantage | docs | Local | Financial market data |
| Chart | antvis/mcp-server-chart | Local | Charting and visualization |
| Brave Search | modelcontextprotocol/servers | Local | Web search |
| stickerdaniel/linkedin-mcp-server | Local | LinkedIn profile, company, and job data |
- Local servers run as child processes on your machine (started via
npxoruvx) - Remote servers are hosted externally and accessed over HTTP (e.g., Kaggle MCP)
Both types are configured in the same mcp.json file, but remote servers use mcp-remote as a bridge.
See the MCP Servers Directory for more options.
Quick Start
Configuring MCP Config Servers
MCP Config servers are configured in ~/.datalayer/mcp.json. These servers start automatically when the agent runtime starts and appear in the agent creation form.
{
"mcpServers": {
"tavily-remote": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>"
]
}
}
}
Environment variables are automatically expanded using ${VAR_NAME} syntax.
Using MCP Tools in Agents
from pydantic_ai import Agent
from agent_runtimes.mcp import get_mcp_toolsets
# Get pre-loaded MCP toolsets
mcp_toolsets = get_mcp_toolsets()
# Create agent with MCP tools
agent = Agent(
"anthropic:claude-sonnet-4-20250514",
system_prompt="You are a helpful assistant.",
toolsets=mcp_toolsets,
)
Full Configuration Example
{
"mcpServers": {
"tavily-remote": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>"
]
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"]
},
"linkedin": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/stickerdaniel/linkedin-mcp-server",
"linkedin-mcp-server"
]
},
"kaggle": {
"command": "npx",
"args": [
"mcp-remote",
"https://www.kaggle.com/mcp",
"--header",
"Authorization: Bearer <KAGGLE_TOKEN>"
]
}
}
Server-Specific Setup
Tavily MCP Server
The Tavily MCP Server provides web search and content extraction tools.
{
"mcpServers": {
"tavily-remote": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>"
]
}
}
}
Filesystem MCP Server
The Filesystem MCP Server provides tools for interacting with the local filesystem.
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"]
}
}
}
GitHub MCP Server
The GitHub MCP Server provides tools for interacting with GitHub repositories.
{
"mcpServers": {
"github": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"GITHUB_TOKEN",
"ghcr.io/github/github-mcp-server"
],
"env": {
"GITHUB_TOKEN": "<GITHUB_TOKEN>"
}
}
}
}
Google Workspace MCP Server
The Google Workspace MCP Server provides tools for interacting with Google Workspace services like Gmail and Google Drive.
{
"mcpServers": {
"google-workspace": {
"command": "uvx",
"args": ["workspace-mcp"],
"env": {
"GOOGLE_OAUTH_CLIENT_ID": "<your-client-id>",
"GOOGLE_OAUTH_CLIENT_SECRET": "<your-client-secret>"
}
}
}
}
Slack MCP Server
The Slack MCP Server provides tools for interacting with Slack workspaces.
{
"mcpServers": {
"slack": {
"command": "npx",
"args": ["-y", "@datalayer/slack-mcp-server"],
"env": {
"SLACK_BOT_TOKEN": "<your-slack-bot-token>",
"SLACK_TEAM_ID": "<your-slack-team-id>",
"SLACK_CHANNEL_IDS": "<your-slack-channel-ids>"
}
}
}
}
Kaggle MCP Server
The Kaggle MCP Server is a remote HTTP server that provides access to Kaggle datasets, models, competitions, notebooks, and benchmarks.
Option 1: Token Authentication (recommended for Agent Runtimes)
{
"mcpServers": {
"kaggle": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://www.kaggle.com/mcp",
"--header",
"Authorization: Bearer <KAGGLE_TOKEN>"
]
}
}
}
Option 2: Browser OAuth (auto-login)
{
"mcpServers": {
"kaggle": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://www.kaggle.com/mcp"]
}
}
}
AlphaVantage MCP Server
The AlphaVantage MCP Server provides financial market data tools.
{
"mcpServers": {
"alphavantage": {
"command": "uvx",
"args": ["av-mcp==0.2.1", "<YOUR_API_KEY>"],
"env": {"MAX_RESPONSE_TOKENS": "100000"}
}
}
}
Chart MCP Server
The Chart MCP Server provides charting and visualization tools.
{
"mcpServers": {
"chart": {
"command": "npx",
"args": ["-y", "@antv/mcp-server-chart"]
}
}
}
LinkedIn MCP Server
The LinkedIn MCP server requires browser automation via Playwright.
uvx --from playwright playwright install chromium
uvx --from git+https://github.com/stickerdaniel/linkedin-mcp-server linkedin-mcp-server --get-session