HUD Documentation — Evaluations and RL Environments.

v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.

Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Migrate to @env.scenario() for new code.

MCPServer → Environment

Environment inherits from MCPServer. Same API, same behavior. Just change the import:

# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")

@mcp.tool()
def my_tool(): ...

mcp.run()

# After
from hud import Environment
env = Environment("my-env")

@env.tool()
def my_tool(): ...

env.run()

That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.

Migrating Tasks: Prompt Passthrough Pattern

The recommended migration uses the prompt passthrough pattern—scenario arguments become the prompt content.

Local Environments (Most Common)

For local environments where tools are defined in the same file, call tool functions directly:

from hud import Environment

env = Environment("shopping")

@env.tool()
def navigate(url: str) -> str:
    """Navigate to a URL."""
    return f"Navigated to {url}"

@env.tool()
def check_cart() -> dict:
    """Check the shopping cart."""
    return {"has_items": True, "total": 99.99}

@env.scenario("buy-item")
async def buy_item(instruction: str, start_url: str = "https://example.com"):
    # Call tools directly as functions
    navigate(url=start_url)
    
    answer = yield instruction
    
    result = check_cart()
    yield 1.0 if result["has_items"] else 0.0

# Create tasks
task1 = env("buy-item", instruction="Add a laptop to cart")
task2 = env("buy-item", instruction="Find the cheapest item", start_url="https://store.example.com")

Remote Environments

When connecting to a remote environment via connect_hub(), use call_tool() since tools are defined remotely:

from hud import Environment

env = Environment("browser").connect_hub("hud-evals/browser")

@env.scenario("web-task")
async def web_task(instruction: str, start_url: str = "https://example.com"):
    # Use call_tool for remote tools
    await env.call_tool("navigate", url=start_url)
    
    answer = yield instruction
    
    result = await env.call_tool("check_completion")
    yield 1.0 if result["success"] else 0.0

task1 = env("web-task", instruction="Find the contact page and extract the support email")

Direct calls vs call_tool(): Use direct function calls for local tools (defined in the same Environment). Use call_tool() only when connecting to remote environments where tools are defined elsewhere.

This pattern:

Args ARE the prompt: The instruction flows directly through as the agent’s task
Enables parametric evaluation: Same scenario, different instructions
Replaces hardcoded prompts: Instead of LegacyTask(prompt="..."), pass the prompt as an arg
Type-safe: Arguments are validated against the scenario signature

Before/After Comparison

# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
    prompt="Find all products under $50 and add the cheapest to cart",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://shop.example.com"}},
    evaluate_tool={"name": "check_cart", "arguments": {}}
)

# AFTER - Prompt passthrough pattern (local environment)
@env.scenario("shopping")
async def shopping(task: str, shop_url: str):
    navigate(url=shop_url)  # Direct function call
    
    answer = yield task  # The task arg IS the prompt
    
    result = check_cart()  # Direct function call
    yield 1.0 if result["has_items"] else 0.0

# Now create multiple tasks with different instructions
tasks = [
    env("shopping", task="Find all products under $50 and add the cheapest to cart", shop_url="https://shop.example.com"),
    env("shopping", task="Search for 'laptop' and add the first result to cart", shop_url="https://shop.example.com"),
    env("shopping", task="Apply promo code SAVE20 at checkout", shop_url="https://shop.example.com"),
]

The Migration Rule

prompt → scenario arg (passthrough)
setup_tool → code before first yield
evaluate_tool → code after first yield

Multiple setup_tool Calls

If you have multiple setup tools, just call them in sequence:

# BEFORE
setup_tool=[
    {"name": "navigate", "arguments": {"url": "..."}},
    {"name": "login", "arguments": {"user": "..."}},
]

# AFTER (local environment - direct calls)
@env.scenario("authenticated-task")
async def authenticated_task(instruction: str, username: str):
    navigate(url="https://app.example.com")
    login(user=username)
    
    answer = yield instruction
    
    result = check_completion()
    yield 1.0 if result else 0.0

JSON Task Format (Platform Ready)

For JSON-based task definitions that can be uploaded to the HUD platform, use this format:

{
  "env": {
    "name": "hud-evals/browser"
  },
  "scenario": "web-task",
  "args": {
    "instruction": "Find the contact page and extract the support email",
    "start_url": "https://example.com"
  }
}

This maps directly to the scenario call: env("web-task", instruction="...", start_url="..."). Example: Task set for platform upload

[
  {
    "env": { "name": "hud-ops-diagnostics-sentry" },
    "scenario": "sentry-agent:investigate",
    "args": {
      "issue_id": "PROJ-1234",
      "max_depth": 3
    }
  },
  {
    "env": { "name": "hud-evals/browser" },
    "scenario": "web-task",
    "args": {
      "instruction": "Add a MacBook Pro to cart and proceed to checkout"
    }
  }
]

The args field uses prompt passthrough—the values flow directly into the scenario’s yield statement.

Using with Built-in Agents

Built-in agents work with scenarios:

from hud.agents import ClaudeAgent

agent = ClaudeAgent.create()
result = await agent.run(env("web-task", instruction="Find the pricing page"))

Bring Your Own Agent

v5 gives you the hud.eval() context manager for maximum flexibility:

async with hud.eval(env("shopping", task="Add item to cart", shop_url="https://shop.example.com")) as ctx:
    # Use OpenAI, Anthropic, your own agent—whatever you want
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.as_openai_chat_tools()
    )
    
    # Handle tool calls, run your agent loop...
    await ctx.submit(response.choices[0].message.content)

print(ctx.reward)

Quick Reference

v4 (deprecated in v0.6.0)	v5 (recommended)
`LegacyTask(prompt=...)`	`env("scenario", instruction=...)` — prompt passthrough
`setup_tool`	Code before first yield in `@env.scenario()`
`evaluate_tool`	Code after first yield in `@env.scenario()`
`MCPServer`	`Environment` (drop-in replacement)
JSON with `mcp_config` + `prompt`	JSON with `env` + `scenario` + `args`

Get Started

Essentials

Guides

Cookbooks

Advanced

SDK Reference

CLI Reference

Community

Migrating from v4

MCPServer → Environment

Migrating Tasks: Prompt Passthrough Pattern

Local Environments (Most Common)

Remote Environments

Before/After Comparison

The Migration Rule

Multiple setup_tool Calls

JSON Task Format (Platform Ready)

Using with Built-in Agents

Bring Your Own Agent

Quick Reference

Get Started

Essentials

Guides

Cookbooks

Advanced

SDK Reference

CLI Reference

Community

​MCPServer → Environment

​Migrating Tasks: Prompt Passthrough Pattern

​Local Environments (Most Common)

​Remote Environments

​Before/After Comparison

​The Migration Rule

​Multiple setup_tool Calls

​JSON Task Format (Platform Ready)

​Using with Built-in Agents

​Bring Your Own Agent

​Quick Reference

MCPServer → Environment

Migrating Tasks: Prompt Passthrough Pattern

Local Environments (Most Common)

Remote Environments

Before/After Comparison

The Migration Rule

Multiple setup_tool Calls

JSON Task Format (Platform Ready)

Using with Built-in Agents

Bring Your Own Agent

Quick Reference