Local Testing
| Environment | local_test.py |
|---|---|
| No Docker | from env import env |
| Docker | env.connect_url("http://localhost:8765/mcp") |
Testing Scenarios Directly
Scenarios are async generators.hud.eval() drives them automatically, but you can test the logic directly—this is exactly what runs at the start and end of hud.eval():
hud.eval() will behave identically.
Mocking
env.mock() intercepts at the tool layer—agents only see tools:
Hot-Reload
For Docker environments,hud dev -w path reloads Python on save:
Debugging Build Failures
hud build runs the exact same pipeline as New → Environment on hud.ai—so if it passes locally, it’ll work in production. If the build fails or the container crashes on startup, use hud debug to run a 5-phase compliance test:
Scenario MCP Protocol Mapping
Understanding how scenarios map to MCP is crucial for debugging. Each scenario registers two MCP endpoints:| Phase | MCP Type | Endpoint | What it does |
|---|---|---|---|
| Setup | Prompt | get_prompt("{env}:{scenario}", args) | Runs code before first yield, returns the prompt |
| Evaluate | Resource | read_resource("{env}:{scenario}") | Runs code after first yield, returns {"reward": float} |
Debug with raw MCP calls
If a scenario isn’t working, test each phase directly:Common debugging scenarios
Problem:evaluate_tool: NULL but using v5 scenarios
- Cause: v5 scenarios don’t use
evaluate_tool—they return rewards viaread_resource - Fix: Ensure your orchestrator calls
read_resource()after agent completion
TypeError when evaluating with complex args like list[dict]
- Cause: MCP passes all arguments as strings; SDK deserializes them
- Debug: Add logging to check
type(arg)at scenario entry
- Cause:
submit()wasn’t called beforeread_resource() - Fix: Call
await env.submit(scenario_name, answer)first