Documentation Index
Fetch the complete documentation index at: https://hud-f5fd7c15-parallel-agent-telemetry.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Before deploying, test locally. See Sandboxing for Docker vs no-Docker patterns.
Local Testing
| Environment | local_test.py |
|---|
| No Docker | from env import env |
| Docker | env.connect_url("http://localhost:8765/mcp") |
Both use the same API after setup:
async with env:
tools = env.as_tools() # List available tools
result = await env.call_tool("my_tool", arg="val") # Call a tool
Testing Scenarios Directly
Scenarios are async generators. hud.eval() drives them automatically, but you can test the logic directly—this is exactly what runs at the start and end of hud.eval():
async def checkout(user_id: str, amount: int = 100):
# Setup + prompt (first yield) — runs at hud.eval() start
answer = yield f"Complete checkout for {user_id}, ${amount}"
# Evaluation (second yield) — runs after agent submits
yield 1.0 if "success" in answer.lower() else 0.0
async def test():
gen = checkout("alice", 50)
prompt = await anext(gen) # What hud.eval() does at start
reward = await gen.asend("Success!") # What hud.eval() does after submit
assert reward == 1.0
If your scenario tests pass, hud.eval() will behave identically.
Mocking
env.mock() intercepts at the tool layer—agents only see tools:
env.mock() # All tools return fake responses
env.mock_tool("send_email", {"status": "sent"})
# Check mock state
assert env.is_mock == True
Hot-Reload
For Docker environments, hud dev -w path reloads Python on save:
hud dev -w scenarios -w tools --port 8765
System services (postgres, VNC, browsers) persist across reloads.
Debugging Build Failures
hud build runs the exact same pipeline as New → Environment on hud.ai—so if it passes locally, it’ll work in production. If the build fails or the container crashes on startup, use hud debug to run a 5-phase compliance test:
Output shows exactly which phase failed:
✓ Phase 1: Docker image exists
✓ Phase 2: MCP server responds to initialize
✗ Phase 3: Tool discovery failed
→ Error: Connection refused on port 8005
→ Hint: Backend service may not be starting
You can also debug a directory (builds first) or stop at a specific phase:
hud debug . # Build and debug current directory
hud debug . --max-phase 3 # Stop after phase 3
hud debug --config mcp.json # Debug from config file
Scenario MCP Protocol Mapping
Understanding how scenarios map to MCP is crucial for debugging. Each scenario registers two MCP endpoints:
| Phase | MCP Type | Endpoint | What it does |
|---|
| Setup | Prompt | get_prompt("{env}:{scenario}", args) | Runs code before first yield, returns the prompt |
| Evaluate | Resource | read_resource("{env}:{scenario}") | Runs code after first yield, returns {"reward": float} |
Debug with raw MCP calls
If a scenario isn’t working, test each phase directly:
async with env:
# Phase 1: Setup (runs code before first yield)
prompt_result = await env.get_prompt(
"myenv:checkout",
{"product": "laptop", "user_id": "alice"}
)
print(f"Prompt: {prompt_result.messages[0].content}")
# ... agent runs here ...
# Phase 2: Submit answer (stores it for evaluation)
await env.submit("checkout", answer="Order completed successfully")
# Phase 3: Evaluate (runs code after first yield)
resource_result = await env.read_resource("myenv:checkout")
print(f"Reward: {resource_result}") # {"reward": 1.0}
Common debugging scenarios
Problem: evaluate_tool: NULL but using v5 scenarios
- Cause: v5 scenarios don’t use
evaluate_tool—they return rewards via read_resource
- Fix: Ensure your orchestrator calls
read_resource() after agent completion
Problem: TypeError when evaluating with complex args like list[dict]
- Cause: MCP passes all arguments as strings; SDK deserializes them
- Debug: Add logging to check
type(arg) at scenario entry
Problem: Scenario setup works but evaluate returns no reward
- Cause:
submit() wasn’t called before read_resource()
- Fix: Call
await env.submit(scenario_name, answer) first
Useful Environment Properties
# Check parallelization (for running multiple evals)
env.is_parallelizable # True if all connections are remote
# List what's connected
env.connections # Dict of connection names → connectors
env.is_connected # True if in async context
# Resources and prompts (beyond tools)
await env.list_resources() # MCP resources
await env.list_prompts() # MCP prompts