HUD Documentation — Evaluations and RL Environments.

The HUD SDK provides a comprehensive set of tools for environment interaction. All tools inherit from BaseTool and return standardized MCP content blocks.

Base Classes

BaseTool

from hud.tools import BaseTool

Abstract base class for all MCP tools. Provides standardized output formatting and MCP registration. Constructor Parameters:

Parameter	Type	Description	Default
`env`	`Any`	Optional stateful context (game, executor, browser)	`None`
`name`	`str`	Tool name for MCP registration	Auto from class
`title`	`str`	Human-readable display name	Auto from class
`description`	`str`	Tool description	Auto from docstring

Abstract Methods:

async def __call__(self, **kwargs: Any) -> list[ContentBlock]

Properties:

mcp - FastMCP FunctionTool wrapper for registration

Usage:

class MyTool(BaseTool):
    async def __call__(self, param: str) -> list[ContentBlock]:
        result = await self.env.do_something(param)
        return [TextContent(text=result, type="text")]

# MCPServer automatically handles BaseTool instances
tool = MyTool(env=my_context)
mcp_server.add_tool(tool)  # No need for .mcp property

BaseHub

from hud.tools import BaseHub

A composition-friendly FastMCP server for organizing tools into nested namespaces. Key Features:

Internal tool dispatcher pattern
Automatic resource catalog generation
Hidden internal tools from MCP clients

Usage:

hub = BaseHub("evaluators")

@hub.tool()  # Internal tool, hidden from agents
async def url_match(url: str) -> EvaluationResult:
    return EvaluationResult(reward=1.0, done=True)

# Access via dispatcher when mounted on server
# Agents call: evaluators(name="url_match", arguments={"url": "..."})

Agent Tools

AgentTool

from hud.tools import AgentTool

Wraps a scenario as a tool that can be called by another agent. Essential for building hierarchical agent systems where an orchestrator delegates to specialized subagents. Constructor Parameters:

Parameter	Type	Description	Default
`task`	`Task`	Task template from `env("scenario_name")`	Required
`model`	`str`	Model for subagent (via gateway)	`None`
`agent`	`type[MCPAgent]`	Custom agent class	`None`
`agent_params`	`dict`	Additional agent parameters	`{}`
`name`	`str`	Tool name for orchestrator	From scenario
`description`	`str`	Tool description	Auto-generated
`trace`	`bool`	Enable tracing for standalone runs	`False`

Must provide either model or agent, not both.

Eval-Only Parameters: Parameters with | None = None are hidden from the orchestrator but available for evaluation:

@env.scenario("investigate")
async def investigate(
    query: str,                          # Visible - orchestrator passes this
    expected_finding: str | None = None, # Hidden - only used in eval scoring
):
    response = yield f"Investigate: {query}"
    
    # Scoring uses expected_finding but orchestrator never sees it
    if expected_finding and response:
        yield 1.0 if expected_finding in response else 0.5
    else:
        yield 1.0 if response else 0.0

Usage:

from hud import Environment
from hud.tools import AgentTool

# Subagent environment with scenario
sentry_env = Environment(name="sentry-agent")

@sentry_env.scenario("investigate")
async def investigate_sentry(query: str):
    yield f"Investigate Sentry: {query}"

# Create orchestrator
orchestrator = Environment(name="orchestrator")

# Wrap subagent scenario as tool
tool = AgentTool(
    sentry_env("investigate"),  # Task template
    model="gpt-4o-mini",
    name="investigate_sentry",
    description="Investigate errors in Sentry",
)
orchestrator.add_tool(tool.mcp)

# Now orchestrator agent can call investigate_sentry(query="...")

Trace Continuity: When called from within an eval context, AgentTool automatically:

Inherits the parent’s trace_id
Skips duplicate trace registration
Routes all inference/tool calls to the parent trace

async with hud.eval(task) as ctx:
    agent = create_agent("gpt-4o")
    result = await agent.run(ctx)
    # All subagent activity appears in this single trace

To create a separate trace per AgentTool call (useful for parallel subagents), set trace_subagent=True. The tool result includes the sub-trace id in result.meta["trace_id"]. See Also: Ops Diagnostics Cookbook for a complete hierarchical agent example.

Core Tools

BashTool

from hud.tools import BashTool

Execute bash commands in a persistent shell session. Constructor Parameters:

Parameter	Type	Description	Default
`session`	`_BashSession`	Pre-configured bash session	`None`

Tool Parameters:

Parameter	Type	Description	Default
`command`	`str`	Bash command to execute	Required
`restart`	`bool`	Restart the bash session	`False`

Example:

bash = BashTool()
result = await bash(command="ls -la")
# Returns: [TextContent(text="file1.txt\nfile2.txt", type="text")]

# Restart session
await bash(restart=True)

EditTool

from hud.tools import EditTool

File system editor with undo support. Constructor Parameters:

Parameter	Type	Description	Default
`file_history`	`dict[Path, list[str]]`	Edit history per file	`None`

Commands:

Command	Parameters	Description
`view`	`path`, `view_range`	View file or directory contents
`create`	`path`, `file_text`	Create new file
`str_replace`	`path`, `old_str`, `new_str`	Replace string in file
`insert`	`path`, `insert_line`, `new_str`	Insert text at line
`undo_edit`	`path`	Undo last edit

Example:

editor = EditTool()

# View file
await editor(command="view", path="/path/to/file.py", view_range=[1, 20])

# Create file
await editor(command="create", path="/new/file.py", file_text="print('hello')")

# Replace text
await editor(command="str_replace", path="/file.py", 
            old_str="old_text", new_str="new_text")

PlaywrightTool

from hud.tools import PlaywrightTool

Web automation using Playwright browser. Constructor Parameters:

Parameter	Type	Description	Default
`page`	`Page`	Existing Playwright page	`None`
`cdp_url`	`str`	Chrome DevTools Protocol URL	`None`

Actions:

Action	Parameters	Description
`navigate`	`url`, `wait_for_load_state`	Navigate to URL
`screenshot`	-	Capture page screenshot
`click`	`selector`	Click element
`type`	`selector`, `text`	Type text in element
`get_page_info`	-	Get page title and URL
`wait_for_element`	`selector`	Wait for element to appear

Example:

tool = PlaywrightTool()

# Navigate
await tool(action="navigate", url="https://example.com")

# Click button
await tool(action="click", selector="button.submit")

# Type text
await tool(action="type", selector="input#search", text="query")

JupyterTool

from hud.tools import JupyterTool

Execute Python code in a Jupyter kernel. Constructor Parameters:

Parameter	Type	Description	Default
`url_suffix`	`str`	(Optional) Kernel gateway host:port	”localhost:8888”
`kernel_name`	`str`	(Optional) Kernel name	”python3”
`kernel_id`	`str`	(Optional) If set, connect to the existed kernel with `kernel_id`	""(empty string)

Tool Parameters:

Parameter	Type	Description	Default
`code`	`str`	Python code to execute	Required
`execution_timeout`	`int`	Execution timeout in seconds	15

Example: Tool Calling

jupyter_tool = JupyterTool()
result = await jupyter_tool(code="print(\"hello world!\")")
# Returns: [TextContent(text="hello world\n", type="text")]

Example: Reuse the same kernel

# When initializing, register the kernel
jupyter_tool = JupyterTool()
JupyterTool.register_shared_kernel("my-jupyter", jupyter_tool.get_kernel_id())

# Reuse the same kernel in other places:
jupyter_tool_reuse = JupyterTool.from_shared_kernel("my-jupyter")

Computer Control Tools

HudComputerTool

from hud.tools import HudComputerTool

Universal computer control with automatic scaling and executor selection. Constructor Parameters:

Parameter	Type	Description	Default
`executor`	`BaseExecutor`	Executor to use	Auto-detect
`platform_type`	`"auto"`, `"xdo"`, `"pyautogui"`	Executor type	`"auto"`
`display_num`	`int`	X display number	From env
`width`	`int`	Agent screen width	1280
`height`	`int`	Agent screen height	720
`rescale_images`	`bool`	Rescale screenshots	`True`

Actions:

Action	Parameters	Description
`screenshot`	-	Capture screen
`click`	`x`, `y`, `button`, `pattern`, `hold_keys`	Mouse click
`write`	`text`, `enter_after`, `delay`	Type text
`press`	`keys`	Press key combination
`scroll`	`x`, `y`, `scroll_x`, `scroll_y`	Scroll
`drag`	`start_x`, `start_y`, `end_x`, `end_y`	Drag mouse

Example:

computer = HudComputerTool()

# Take screenshot
await computer(action="screenshot")

# Click at coordinates
await computer(action="click", x=100, y=200)

# Type text
await computer(action="write", text="Hello World", enter_after=True)

# Press hotkey
await computer(action="press", keys=["ctrl", "c"])

AnthropicComputerTool

from hud.tools import AnthropicComputerTool

Computer control optimized for Anthropic’s Claude models. Features:

Pre-configured for 1280x720 resolution
Optimized action names for Claude
Built-in screenshot scaling

OpenAIComputerTool

from hud.tools import OpenAIComputerTool

Computer control optimized for OpenAI models. Features:

Pre-configured for 1920x1080 resolution
Simplified action interface
No automatic screenshot scaling

Executors

Executors provide platform-specific implementations for computer control actions.

BaseExecutor

from hud.tools.executors import BaseExecutor

Abstract base providing simulation mode for all actions. Core Methods:

click(x, y, button, pattern, hold_keys) - Mouse click
write(text, enter_after, delay) - Type text
press(keys) - Press key combination
scroll(x, y, scroll_x, scroll_y) - Scroll
drag(start_x, start_y, end_x, end_y) - Drag mouse
screenshot() - Capture screen
get_screen_size() - Get display dimensions

PyAutoGUIExecutor

from hud.tools.executors import PyAutoGUIExecutor

Cross-platform executor using PyAutoGUI library. Features:

Works on Windows, macOS, Linux
Real mouse/keyboard control
Screenshot capture
Automatic failsafe

Example:

executor = PyAutoGUIExecutor()
computer = HudComputerTool(executor=executor)

XDOExecutor

from hud.tools.executors import XDOExecutor

Linux/X11 executor using xdotool. Features:

Native X11 integration
Faster than PyAutoGUI on Linux
Support for X display selection
Window management capabilities

Example:

executor = XDOExecutor(display_num=1)
computer = HudComputerTool(executor=executor)

Common Types

ContentBlock

MCP standard output format (from mcp.types):

from mcp.types import TextContent, ImageContent

# Text output
TextContent(text="Operation complete", type="text")

# Image output  
ImageContent(data="base64_data", mimeType="image/png", type="image")

EvaluationResult

from hud.tools.types import EvaluationResult

result = EvaluationResult(
    reward=0.8,        # Score 0-1
    done=True,         # Task complete
    content="Details", # Optional text
    info={"score": 80} # Metadata
)

ContentResult

from hud.tools.types import ContentResult

# Helper for building complex outputs
result = ContentResult(
    output="Success message",
    error="Error if any",
    base64_image="screenshot_data",
    system="System message"
)

# Convert to ContentBlocks
blocks = result.to_content_blocks()

Integration Examples

Adding Tools to Environment

from hud.server import MCPServer
from hud.tools import BashTool, EditTool

mcp = MCPServer(name="my-env")

# MCPServer handles BaseTool instances automatically
bash = BashTool()
mcp.add_tool(bash)  # Internally uses bash.mcp

editor = EditTool()
mcp.add_tool(editor)  # Same here

Custom Tool Implementation

from hud.tools import BaseTool
from mcp.types import TextContent

class DatabaseTool(BaseTool):
    def __init__(self, db_connection):
        super().__init__(
            env=db_connection,
            name="database",
            title="Database Query Tool",
            description="Execute SQL queries"
        )
    
    async def __call__(self, query: str) -> list[ContentBlock]:
        try:
            results = await self.env.execute(query)
            return [TextContent(text=str(results), type="text")]
        except Exception as e:
            return [TextContent(text=f"Error: {e}", type="text")]

Hub Pattern for Evaluators

from hud.tools import BaseHub
from hud.tools.types import EvaluationResult

evaluators = BaseHub("evaluate")

@evaluators.tool("text_contains")
async def check_text(text: str, target: str) -> EvaluationResult:
    return EvaluationResult(
        reward=1.0 if target in text else 0.0,
        done=True,
        content=f"Checking if '{target}' in text"
    )

# Use in environment
@mcp.tool()
async def evaluate(name: str, **kwargs):
    return await evaluators.call_tool(name, kwargs)

Callback Functions

Callback functions help you monitor certain types of actions and hook behaviors to them. You can manage callbacks using three main methods,

add_callback(self, event_type: str, callback: Callable)
remove_callback(self, event_type: str, callback: Callable)
_trigger_callbacks(self, event_type: str, **kwargs)  

Special note: Make sure the callback functions are defined by async def For example, when you want to take a screenshot every time the webpage changes in Playwright

from hud.tools.playwright import PlaywrightTool
import base64

class MyPlaywrightTool(PlaywrightTool):
   def __init__(self, context: Any = None, cdp_url: str | None = None) -> None:
       super().__init__(cdp_url=cdp_url)
       self.screenshots: List = []
       self.add_callback("webpage_change", self._on_webpage_change)

   async def _on_webpage_change(self, **kwargs) -> None:
       """Callback to capture screenshots on webpage changes"""
       try:
           await self._ensure_browser()
           if self.page:
               screenshot_bytes = await self.page.screenshot(full_page=True)
               screenshot_b64 = base64.b64encode(screenshot_bytes).decode("utf-8")
               self.screenshots.append(screenshot_b64)
       except Exception as e:
           pass

   async def navigate(self, url: str, wait_for_load_state="networkidle"):
       result = await super().navigate(url, wait_for_load_state)
       # Trigger callbacks with event data
       await self._trigger_callbacks("webpage_change")
       return result

   async def click(self, selector: str, **kwargs):
       result = await super().click(selector, **kwargs)
       await self._trigger_callbacks("webpage_change")
       return result

Get Started

Essentials

Guides

Cookbooks

Advanced

SDK Reference

CLI Reference

Community

Tools

Base Classes

BaseTool

BaseHub

Agent Tools

AgentTool

Core Tools

BashTool

EditTool

PlaywrightTool

JupyterTool

Computer Control Tools

HudComputerTool

AnthropicComputerTool

OpenAIComputerTool

Executors

BaseExecutor

PyAutoGUIExecutor

XDOExecutor

Common Types

ContentBlock

EvaluationResult

ContentResult

Integration Examples

Adding Tools to Environment

Custom Tool Implementation

Hub Pattern for Evaluators

Callback Functions

See Also

Get Started

Essentials

Guides

Cookbooks

Advanced

SDK Reference

CLI Reference

Community

​Base Classes

​BaseTool

​BaseHub

​Agent Tools

​AgentTool

​Core Tools

​BashTool

​EditTool

​PlaywrightTool

​JupyterTool

​Computer Control Tools

​HudComputerTool

​AnthropicComputerTool

​OpenAIComputerTool

​Executors

​BaseExecutor

​PyAutoGUIExecutor

​XDOExecutor

​Common Types

​ContentBlock

​EvaluationResult

​ContentResult

​Integration Examples

​Adding Tools to Environment

​Custom Tool Implementation

​Hub Pattern for Evaluators

​Callback Functions

​See Also

Base Classes

BaseTool

BaseHub

Agent Tools

AgentTool

Core Tools

BashTool

EditTool

PlaywrightTool

JupyterTool

Computer Control Tools

HudComputerTool

AnthropicComputerTool

OpenAIComputerTool

Executors

BaseExecutor

PyAutoGUIExecutor

XDOExecutor

Common Types

ContentBlock

EvaluationResult

ContentResult

Integration Examples

Adding Tools to Environment

Custom Tool Implementation

Hub Pattern for Evaluators

Callback Functions

See Also