v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Migrate to @env.scenario() for new code.
MCPServer → Environment
Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")
@mcp.tool()
def my_tool(): ...
mcp.run()
# After
from hud import Environment
env = Environment("my-env")
@env.tool()
def my_tool(): ...
env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.
Migrating Tasks: Prompt Passthrough Pattern
The recommended migration uses the prompt passthrough pattern—scenario arguments become the prompt content.
Local Environments (Most Common)
For local environments where tools are defined in the same file, call tool functions directly:
from hud import Environment
env = Environment("shopping")
@env.tool()
def navigate(url: str) -> str:
"""Navigate to a URL."""
return f"Navigated to {url}"
@env.tool()
def check_cart() -> dict:
"""Check the shopping cart."""
return {"has_items": True, "total": 99.99}
@env.scenario("buy-item")
async def buy_item(instruction: str, start_url: str = "https://example.com"):
# Call tools directly as functions
navigate(url=start_url)
answer = yield instruction
result = check_cart()
yield 1.0 if result["has_items"] else 0.0
# Create tasks
task1 = env("buy-item", instruction="Add a laptop to cart")
task2 = env("buy-item", instruction="Find the cheapest item", start_url="https://store.example.com")
Remote Environments
When connecting to a remote environment via connect_hub(), use call_tool() since tools are defined remotely:
from hud import Environment
env = Environment("browser").connect_hub("hud-evals/browser")
@env.scenario("web-task")
async def web_task(instruction: str, start_url: str = "https://example.com"):
# Use call_tool for remote tools
await env.call_tool("navigate", url=start_url)
answer = yield instruction
result = await env.call_tool("check_completion")
yield 1.0 if result["success"] else 0.0
task1 = env("web-task", instruction="Find the contact page and extract the support email")
Direct calls vs call_tool(): Use direct function calls for local tools (defined in the same Environment). Use call_tool() only when connecting to remote environments where tools are defined elsewhere.
This pattern:
- Args ARE the prompt: The instruction flows directly through as the agent’s task
- Enables parametric evaluation: Same scenario, different instructions
- Replaces hardcoded prompts: Instead of
LegacyTask(prompt="..."), pass the prompt as an arg
- Type-safe: Arguments are validated against the scenario signature
Before/After Comparison
# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
prompt="Find all products under $50 and add the cheapest to cart",
mcp_config={"hud": {...}},
setup_tool={"name": "navigate", "arguments": {"url": "https://shop.example.com"}},
evaluate_tool={"name": "check_cart", "arguments": {}}
)
# AFTER - Prompt passthrough pattern (local environment)
@env.scenario("shopping")
async def shopping(task: str, shop_url: str):
navigate(url=shop_url) # Direct function call
answer = yield task # The task arg IS the prompt
result = check_cart() # Direct function call
yield 1.0 if result["has_items"] else 0.0
# Now create multiple tasks with different instructions
tasks = [
env("shopping", task="Find all products under $50 and add the cheapest to cart", shop_url="https://shop.example.com"),
env("shopping", task="Search for 'laptop' and add the first result to cart", shop_url="https://shop.example.com"),
env("shopping", task="Apply promo code SAVE20 at checkout", shop_url="https://shop.example.com"),
]
The Migration Rule
prompt → scenario arg (passthrough)
setup_tool → code before first yield
evaluate_tool → code after first yield
If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
{"name": "navigate", "arguments": {"url": "..."}},
{"name": "login", "arguments": {"user": "..."}},
]
# AFTER (local environment - direct calls)
@env.scenario("authenticated-task")
async def authenticated_task(instruction: str, username: str):
navigate(url="https://app.example.com")
login(user=username)
answer = yield instruction
result = check_completion()
yield 1.0 if result else 0.0
For JSON-based task definitions that can be uploaded to the HUD platform, use this format:
{
"env": {
"name": "hud-evals/browser"
},
"scenario": "web-task",
"args": {
"instruction": "Find the contact page and extract the support email",
"start_url": "https://example.com"
}
}
This maps directly to the scenario call: env("web-task", instruction="...", start_url="...").
Example: Task set for platform upload
[
{
"env": { "name": "hud-ops-diagnostics-sentry" },
"scenario": "sentry-agent:investigate",
"args": {
"issue_id": "PROJ-1234",
"max_depth": 3
}
},
{
"env": { "name": "hud-evals/browser" },
"scenario": "web-task",
"args": {
"instruction": "Add a MacBook Pro to cart and proceed to checkout"
}
}
]
The args field uses prompt passthrough—the values flow directly into the scenario’s yield statement.
Using with Built-in Agents
Built-in agents work with scenarios:
from hud.agents import ClaudeAgent
agent = ClaudeAgent.create()
result = await agent.run(env("web-task", instruction="Find the pricing page"))
Bring Your Own Agent
v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("shopping", task="Add item to cart", shop_url="https://shop.example.com")) as ctx:
# Use OpenAI, Anthropic, your own agent—whatever you want
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": ctx.prompt}],
tools=ctx.as_openai_chat_tools()
)
# Handle tool calls, run your agent loop...
await ctx.submit(response.choices[0].message.content)
print(ctx.reward)
Quick Reference
| v4 (deprecated in v0.6.0) | v5 (recommended) |
|---|
LegacyTask(prompt=...) | env("scenario", instruction=...) — prompt passthrough |
setup_tool | Code before first yield in @env.scenario() |
evaluate_tool | Code after first yield in @env.scenario() |
MCPServer | Environment (drop-in replacement) |
JSON with mcp_config + prompt | JSON with env + scenario + args |