HUD Documentation — Evaluations and RL Environments.

The HUD SDK provides a base MCPAgent class and several pre-built agent implementations for interacting with MCP environments.

Creating Agents

Use the create() factory method to instantiate agents with typed parameters:

from hud.agents import ClaudeAgent

agent = ClaudeAgent.create(
    checkpoint_name="claude-sonnet-4-5",
    max_tokens=8192,
    verbose=True,
)

result = await agent.run(task, max_steps=20)

Direct constructor calls with kwargs are deprecated. Use Agent.create() instead.

Base Class

MCPAgent

from hud.agents import MCPAgent

Abstract base class for all MCP-enabled agents. Handles the agent loop, MCP client lifecycle, tool discovery/filtering, and telemetry. Create Parameters (shared by all agents):

Parameter	Type	Description	Default
`mcp_client`	`AgentMCPClient`	MCP client for server connections	`None`
`auto_trace`	`bool`	Enable automatic tracing spans	`True`
`auto_respond`	`bool`	Use ResponseAgent to decide when to stop/continue	`False`
`verbose`	`bool`	Verbose console logs for development	`False`

Base Config (shared by all agents):

Parameter	Type	Description	Default
`allowed_tools`	`list[str]`	Tool patterns to expose to the model	`None` (all)
`disallowed_tools`	`list[str]`	Tool patterns to hide from the model	`None`
`system_prompt`	`str`	Custom system prompt	`None`
`append_setup_output`	`bool`	Include setup output in first turn	`True`
`initial_screenshot`	`bool`	Include screenshot in initial context	`True`
`response_tool_name`	`str`	Lifecycle tool for submitting responses	`None`

Key Methods:

@classmethod
def create(**kwargs) -> MCPAgent
    """Factory method to create an agent with typed parameters."""

async def run(prompt_or_task: str | Task | dict, max_steps: int = 10) -> Trace
    """Run agent with prompt or task. Returns Trace with results."""

async def call_tools(tool_call: MCPToolCall | list[MCPToolCall]) -> list[MCPToolResult]
    """Execute tool calls through MCP client."""

def get_available_tools() -> list[types.Tool]
    """Get filtered list of available tools."""

Pre-built Agents

ClaudeAgent

from hud.agents import ClaudeAgent

Claude-specific implementation using Anthropic’s API. Config Parameters:

Parameter	Type	Description	Default
`checkpoint_name`	`str`	Claude model to use	`"claude-sonnet-4-5"`
`model_client`	`AsyncAnthropic`	Anthropic client	Auto-created
`max_tokens`	`int`	Maximum response tokens	`16384`
`use_computer_beta`	`bool`	Enable computer-use beta features	`True`
`validate_api_key`	`bool`	Validate key on init	`True`

Example:

from hud import Environment
from hud.agents import ClaudeAgent

env = Environment("browser").connect_hub("hud-evals/browser")

agent = ClaudeAgent.create(
    checkpoint_name="claude-sonnet-4-5",
    max_tokens=8192,
)

# Create task from scenario
task = env("navigate", url="https://example.com")
result = await agent.run(task)

OpenAIAgent

from hud.agents import OpenAIAgent

OpenAI agent using the Responses API for function calling. Config Parameters:

Parameter	Type	Description	Default
`checkpoint_name`	`str`	Model to use	`"gpt-5.1"`
`model_client`	`AsyncOpenAI`	OpenAI client	Auto-created
`max_output_tokens`	`int`	Maximum response tokens	`None`
`temperature`	`float`	Sampling temperature	`None`
`reasoning`	`Reasoning`	Reasoning configuration	`None`
`tool_choice`	`ToolChoice`	Tool selection strategy	`None`
`parallel_tool_calls`	`bool`	Enable parallel tool execution	`None`
`validate_api_key`	`bool`	Validate key on init	`True`

Example:

agent = OpenAIAgent.create(
    checkpoint_name="gpt-4o",
    max_output_tokens=2048,
    temperature=0.7,
)

OperatorAgent

from hud.agents import OperatorAgent

OpenAI Operator-style agent with computer-use capabilities. Extends OpenAIAgent. Config Parameters:

Parameter	Type	Description	Default
`checkpoint_name`	`str`	Model to use	`"computer-use-preview"`
`environment`	`Literal["windows","mac","linux","browser"]`	Computer environment	`"linux"`

Inherits all OpenAIAgent parameters.

GeminiAgent

from hud.agents import GeminiAgent

Google Gemini agent with native computer-use capabilities. Config Parameters:

Parameter	Type	Description	Default
`checkpoint_name`	`str`	Gemini model to use	`"gemini-2.5-computer-use-preview-10-2025"`
`model_client`	`genai.Client`	Gemini client	Auto-created
`temperature`	`float`	Sampling temperature	`1.0`
`top_p`	`float`	Top-p sampling	`0.95`
`top_k`	`int`	Top-k sampling	`40`
`max_output_tokens`	`int`	Maximum response tokens	`8192`
`excluded_predefined_functions`	`list[str]`	Predefined functions to exclude	`[]`
`validate_api_key`	`bool`	Validate key on init	`True`

Example:

agent = GeminiAgent.create(
    checkpoint_name="gemini-2.5-computer-use-preview-10-2025",
    temperature=0.7,
    max_output_tokens=4096,
)

OpenAIChatAgent

from hud.agents import OpenAIChatAgent

OpenAI-compatible chat.completions agent. Works with any endpoint implementing the OpenAI schema (vLLM, Ollama, Together, etc.). Config Parameters:

Parameter	Type	Description	Default
`checkpoint_name`	`str`	Model name	`"gpt-5-mini"`
`openai_client`	`AsyncOpenAI`	OpenAI-compatible client	`None`
`api_key`	`str`	API key (if not using client)	`None`
`base_url`	`str`	Base URL (if not using client)	`None`
`completion_kwargs`	`dict`	Extra args for completions	`{}`

Example:

from hud.agents import OpenAIChatAgent

# Using base_url and api_key
agent = OpenAIChatAgent.create(
    base_url="http://localhost:11434/v1",  # Ollama
    api_key="not-needed",
    checkpoint_name="llama3.1",
    completion_kwargs={"temperature": 0.2},
)

# Or with a custom client
from openai import AsyncOpenAI

agent = OpenAIChatAgent.create(
    openai_client=AsyncOpenAI(base_url="http://localhost:8000/v1"),
    checkpoint_name="served-model",
)

Usage Examples

With Scenarios

from hud import Environment
from hud.agents import ClaudeAgent

# Define environment with scenario
env = Environment("browser").connect_hub("hud-evals/browser")

@env.scenario("shopping")
async def shopping(instruction: str, start_url: str):
    navigate(url=start_url)  # Direct function call for local tools
    answer = yield instruction
    result = check_cart()
    yield 1.0 if result["has_items"] else 0.0

# Run agent on task
agent = ClaudeAgent.create()
task = env("shopping", instruction="Add laptop to cart", start_url="https://shop.example.com")
result = await agent.run(task, max_steps=20)
print(f"Reward: {result.reward}, Done: {result.done}")

With Remote Environment

from hud import Environment
from hud.agents import OperatorAgent

# Connect to a remote environment
env = Environment("browser").connect_hub("hud-evals/browser")

# Create task from remote scenario
task = env("web-task", instruction="Find the price of the product")

agent = OperatorAgent.create()
result = await agent.run(task, max_steps=20)

Auto-Respond Mode

When auto_respond=True, the agent uses a ResponseAgent to decide whether to continue or stop after each model response:

agent = ClaudeAgent.create(
    auto_respond=True,  # Uses HUD inference gateway
    verbose=True,
)

Get Started

Essentials

Guides

Cookbooks

Advanced

SDK Reference

CLI Reference

Community

Agents

Creating Agents

Base Class

MCPAgent

Pre-built Agents

ClaudeAgent

OpenAIAgent

OperatorAgent

GeminiAgent

OpenAIChatAgent

Usage Examples

With Scenarios

With Remote Environment

Auto-Respond Mode

See Also

Get Started

Essentials

Guides

Cookbooks

Advanced

SDK Reference

CLI Reference

Community

​Creating Agents

​Base Class

​MCPAgent

​Pre-built Agents

​ClaudeAgent

​OpenAIAgent

​OperatorAgent

​GeminiAgent

​OpenAIChatAgent

​Usage Examples

​With Scenarios

​With Remote Environment

​Auto-Respond Mode

​See Also

Creating Agents

Base Class

MCPAgent

Pre-built Agents

ClaudeAgent

OpenAIAgent

OperatorAgent

GeminiAgent

OpenAIChatAgent

Usage Examples

With Scenarios

With Remote Environment

Auto-Respond Mode

See Also