BUILD WITH OTTERLOOP
OtterLoop is the agent-facing contract for SeaOtter's hostile critic. The same loop works on or off AgentOS, across any framework, model, and cloud: submit the work, read the verdict, revise, and iterate until the band clears your gate.
INTEGRATION
Everything routes to the same eval contract. The hosted API owns critic execution, conditioning, localization, rich returns, and the signed audit record. The MCP server and Python SDK are thin wrappers over that HTTP surface.
INTEGRATION
Everything routes to the same eval contract. The hosted API owns critic execution, conditioning, localization, rich returns, and the signed audit record. The MCP server and Python SDK are thin wrappers over that HTTP surface.
MCP
Use in Claude, Codex, Cursor, or any MCP-speaking runtime.
{ "mcpServers": { "otterloop": {
"command": "python", "args": ["-m", "otterloop.mcp_server"],
"env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
"OTTERLOOP_API_KEY": "sk-otter-...",
"OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }curl
One-shot score call over HTTP.
curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"..."}],
"return_feedback_artifacts": true }'Python SDK
Let the client drive produce → grade → revise until ship.
from otterloop import OtterLoopClient otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja") final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")
DEVELOPER CONSOLE
Mint an eval-API key for your account, then copy a ready-to-paste MCP, Python SDK, or curl setup to wire any agent into SeaOtter's hostile critic. The secret is shown once — store it before you leave this page.
Loading keys…
AGENT QUICKSTART
SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).
Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.
1 — mint a key (one-time, signed-in user)
Returns the full sk-otter-... secret exactly once.
curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
-H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
-d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }2 — connect over MCP (.mcp.json)
Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.
{ "mcpServers": { "otterloop": {
"command": "python", "args": ["-m", "otterloop.mcp_server"],
"env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
"OTTERLOOP_API_KEY": "sk-otter-...",
"OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }3 — score over HTTP
One-shot grade -> verdict + run_id to keep iterating.
curl -s https://api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
"return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }5 — iterate until it ships
Re-score a revision against the same run.
curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
"artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'VERDICT CONTRACT
The verdict is designed for frontier agents, not screenshots of human review. It carries score, band, flaws, upgrades, anchors, rationale, and rich-feedback artifact refs the agent can use directly.
Verdict schema
{
"score": 0.91,
"band": "ship",
"decision": "ship",
"flaws": [
{ "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
],
"upgrades": [
{ "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
],
"rationale": "Localized feedback so the agent can revise the exact failing region.",
"feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}CONDITIONING
OtterLoop is not a generic "is this good" score. The contract can condition the verdict on your organisation's policy, the prompt or intent the agent was given, and the reference files it must obey.
Apply the right acceptance policy so the same artifact can clear one team and fail another for a defensible reason.
Carry the original ask into the critic so it judges the work against the assignment, not against a generic idealized answer.
Brand guides, gold examples, source-of-truth docs, and previous iterations all become conditioning evidence.
MODALITIES
The same loop covers text, code, images, decks, documents, spreadsheets, audio, video, and multi-step trajectories. Returns can include both the canonical verdict JSON and media a human or agent can read.
| MODALITIES | RETURNS |
|---|---|
| Image or design frame | Annotated PNG plus flaw bounding boxes and a markdown report |
| Deck, PDF, or document | Annotated pages, per-page notes, and machine-readable anchors |
| Spreadsheet | Flagged cells, criterion-grounded notes, and structured deltas |
| Video or audio | Timestamp markers, captions, and localized rationale |
| Text or code | Span-anchored review with upgrade drafts the agent can apply |