AGENT-NATIVE CONTRACT
SeaOtter's thesis is agents iterating with the hostile critic at scale. So an AI agent that lands here can discover, by itself, how to get a key, connect over MCP or HTTP, and run the OtterScore loop — entirely from machine-readable artifacts. The entry point is /llms.txt.
THE LOOP
/.well-known/llms.txt); the OpenAPI spec and interactive docs carry full schemas.POST /api/v1/agent-keys); the sk-otter-<40 hex> secret is shown once. Hand it to the agent..mcp.json below into Claude / Codex / Cursor, or call the HTTP API with Authorization: Bearer sk-otter-....POST /api/v1/eval/feedback with the artifact + the prompt the agent was given.criterion, severity, evidence, detail, anchor; upgrades[] are concrete fixes.POST /api/v1/eval/runs/{id}/iterate until band clears the gate (e.g. ship).POST /api/v1/eval/workflows/{id}/topology for a composite + chain critique.Connect over MCP
.mcp.json (Claude / Cursor). Codex uses [mcp_servers.otterloop] in config.toml.
{ "mcpServers": { "otterloop": {
"command": "python", "args": ["-m", "otterloop.mcp_server"],
"env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
"OTTERLOOP_API_KEY": "sk-otter-...",
"OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }Score over HTTP
One-shot grade -> verdict + run_id to keep iterating.
curl -s https://api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
"return_feedback_artifacts": true }'API SURFACE
Bases: https://api.seaotter.ai (prod) / https://dev-api.seaotter.ai (dev).
| Method | Path | What it does |
|---|---|---|
| GET | /api/v1/eval/policies | Org acceptance policies to condition grading on. |
| GET | /api/v1/eval/rubrics | List rubrics (acceptance criteria); /{id} for one. |
| POST | /api/v1/eval/feedback | One-shot grade -> { run_id, verdict }. |
| POST | /api/v1/eval/runs | Create a run + first verdict (full conditioning slots). |
| POST | /api/v1/eval/runs/{id}/iterate | Submit a revision, get the next verdict. |
| GET | /api/v1/eval/runs/{id}/score | Fetch a run's latest score. |
| POST | /api/v1/eval/workflows/{id}/topology | Workflow composite + per-step + chain critique. |
| GET/POST | /api/v1/agent-keys | List / mint eval keys (signed-in org user, not an eval key). |
KNOWN GAP — FOLLOW-UP
Today the first eval key is minted once by a human: a signed-in org user at /developers (POST /api/v1/agent-keys requires a product-user JWT, not an eval key). After that, the agent runs the whole loop with no human in the loop. The remaining gap for zero-human onboarding is a scoped, rate-limited, abuse-gated POST /api/v1/agent-signup (or an OAuth-style device/client-credentials grant) that provisions a sandbox tenant + a low-quota key without the Firebase step. Until then, the one-time human key mint is the single manual step.