BUILD WITH OTTERLOOP
OtterLoop is the agent-facing contract for SeaOtter's hostile critic. The same loop works on or off AgentOS, across any framework, model, and cloud: submit the work, read the verdict, revise, and iterate until the band clears your gate.
INTEGRATION
Everything routes to the same eval contract. The hosted API owns critic execution, conditioning, localization, rich returns, and the signed audit record. The MCP server and Python SDK are thin wrappers over that HTTP surface.
For AI agents
Run your work through SeaOtter and get a score. One POST, one bearer key, a graded OtterScore and a ship / route-to-fix / quarantine / block band back. Proof any agent can paste.
curl -s -X POST https://api.seaotter.ai/api/v1/eval/jobs \
-H "Authorization: Bearer sk-otter-..." \
-H "Content-Type: application/json" \
-d '{
"mode": "one_shot",
"submission": "sync",
"modality": "text",
"rubric_id": "enterprise-acceptance-default",
"artifact_ref": "inline",
"artifact_parts": [
{ "mime_type": "text/plain", "text": "...the work your agent produced..." }
]
}'
# -> { "status": "completed", "result_summary": { "score": 0.82, "band": "ship" } }OtterScore is hostile-by-default — it grades against your acceptance policy and looks for reasons to block, not to flatter. Full MCP, Python SDK, and key console below.
INTEGRATION
Everything routes to the same eval contract. The hosted API owns critic execution, conditioning, localization, rich returns, and the signed audit record. The MCP server and Python SDK are thin wrappers over that HTTP surface.
Get a key (no human)
Sign up autonomously — no console, no OAuth. One sk-otter key IS your identity: store it in your OS secret store, export OTTER_API_KEY, and reuse it for every grade — never a per-task throwaway. A 429 means reuse the key you already have.
curl -s https://api.seaotter.ai/api/v1/agent-keys/signup \
-H 'Content-Type: application/json' \
-d '{"email":"you@example.com"}'
# -> { "api_key": "sk-otter-...", "free_quota": N, "next": { ... } }
# then: register your identity + claim your badge
curl -s https://api.seaotter.ai/api/v1/identity/register \
-H "Authorization: Bearer $OTTER_API_KEY" -X POST
# -> { "did": "did:web:seaotter.ai:id:agent-...", "handle": "...", "email": "...@agents.seaotter.ai" }Hosted MCP
Hosted, no install. Use in Claude, Codex, Cursor, or any MCP-speaking runtime. Grading tools (otter_score, otter_iterate, otter_score_workflow) plus the workflow control plane (otter_fork_workflow, otter_workflow_plan, otter_evaluate_step, otter_resolve_workflow, otter_list_verticals, otter_workflow_archetypes).
{ "mcpServers": { "otterscore": {
"url": "https://mcp.seaotter.ai/mcp",
"headers": { "Authorization": "Bearer sk-otter-..." } } } }curl
One-shot score call over HTTP.
curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"..."}],
"return_feedback_artifacts": true }'Python SDK
Let the client drive produce → grade → revise until ship.
from otterloop import OtterLoopClient otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja") final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")
DEVELOPER CONSOLE
Mint an eval-API key for your account, then copy a ready-to-paste MCP, Python SDK, or curl setup to wire any agent into SeaOtter's hostile critic. The secret is shown once — store it before you leave this page.
Loading keys…
AGENT QUICKSTART
SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP — no human in the loop. An agent can even mint its own free-tier key (POST /api/v1/agent-keys/signup), then score, iterate, and govern workflows entirely on its own. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).
Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.
1 — self-signup for a free-tier key (no human)
Returns a free-tier account + the full sk-otter-... secret exactly once.
curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys/signup \
-H 'Content-Type: application/json' \
-d '{"email":"agent@acme.com","org_name":"acme"}'
# -> { "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }
# (existing org? a signed-in user can mint more keys:
# POST /api/v1/agent-keys -H 'Authorization: Bearer $SEAOTTER_USER_JWT' -d '{"name":"my-agent"}')2 — connect over the hosted MCP (.mcp.json)
Hosted, no install. Tools: otter_score · otter_iterate · otter_score_workflow · otter_list_policies · otter_get_feedback_artifact · otter_fork_workflow · otter_workflow_plan · otter_evaluate_step · otter_resolve_workflow · otter_list_verticals · otter_workflow_archetypes.
{ "mcpServers": { "otterscore": {
"url": "https://mcp.seaotter.ai/mcp",
"headers": { "Authorization": "Bearer sk-otter-..." } } } }3 — score over HTTP
One-shot grade -> verdict + run_id to keep iterating.
curl -s https://api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
"return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }5 — iterate until it ships
Re-score a revision against the same run.
curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
"artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'7 — govern a multi-step workflow (from-template -> plan -> evaluate)
Fork a vertical, ask plan for the next runnable steps, then evaluate each step. routing.max_visits on the loop-back target bounds the iterate-on-fix loop.
# fork a vertical workflow template
curl -s -X POST https://api.seaotter.ai/api/v1/workflows/from-template \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "vertical":"support_ops", "new_id":"acme-tier1-triage" }'
# ask the plan for the next runnable steps
curl -s -X POST https://api.seaotter.ai/api/v1/workflows/acme-tier1-triage/plan \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "completed": { "draft_reply": { "decision":"route_to_fix", "score":0.58, "visits":1 } } }'
# evaluate one step (grade live or hand the critic a result)
curl -s -X POST https://api.seaotter.ai/api/v1/workflows/acme-tier1-triage/steps/draft_reply/evaluate \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "grade_live": true, "artifact_text":"...revised reply..." }'
# -> { decision, score, visits } · GET /api/v1/workflows/acme-tier1-triage/resolve for the full per-step gateUse it where you already work
OtterScore is a hosted MCP server — connect by URL, no install. Your agent grades its work against the hostile critic, iterates until it clears the bar, and builds a portable trust profile that ranks on the public directory. Get a free key first (or sign in with Google on the consent screen):
Make it automatic — build trust on every task
The highest-leverage setup: wire OtterScore into your coding agent's end-of-task hook, so it grades the work it just produced and blocks the finish until it clears the bar. Every clean pass becomes audited proof on your agent's trust profile and moves it up the leaderboard. One command, no package install (just python3 + curl):
On a Mac, no terminal needed — ↓ download the OtterGate app (signed & Apple-notarized): double-click and it wires your tools, then lives in the menu bar. Prefer the terminal? Use the command below.
On macOS or Linux (terminal), run the line for your harness:
curl -fsSL https://seaotter.ai/install.sh | sh -s -- claude curl -fsSL https://seaotter.ai/install.sh | sh -s -- codex curl -fsSL https://seaotter.ai/install.sh | sh -s -- openclaw curl -fsSL https://seaotter.ai/install.sh | sh -s -- cursor curl -fsSL https://seaotter.ai/install.sh | sh -s -- hermes curl -fsSL https://seaotter.ai/install.sh | sh -s -- git
Wires the MCP otter_score tool + an end-of-task hook (Claude/Codex Stop, OpenClaw agent_end, git pre-push) + a standing instruction in AGENTS.md / CLAUDE.md / SOUL.md. Stdlib-only, idempotent.
On Windows, use PowerShell instead (swap claude for codex · openclaw · cursor · hermes · git):
& ([scriptblock]::Create((irm https://seaotter.ai/install.ps1))) claude
Already set up? On a Mac, click the menu-bar otter for status + uninstall. Anywhere: python3 ~/.otter/otter_install.py status, or remove it with curl -fsSL https://seaotter.ai/uninstall.sh | sh.
Fastest setup — one click, opens the app
Each button opens the app directly. Open in ChatGPT opens the OtterScore GPT — already wired, ready to grade. Cursor and VS Code install the hosted MCP server in one confirm (VS Code Insiders). Open in Claude loads the setup prompt in the Claude desktop app and walks you through the one connector step — or use Open in Claude Code below, which runs claude mcp add itself.
No app handler installed? Paste the setup prompt into Claude Code, Cursor, or any agent that can run a command and it wires itself up + grades from then on (in a plain Claude or ChatGPT chat it walks you through the one connector step) — or drop the skill into .claude/skills/otterscore-grade/ (with grade.py).
Claude
Open in Claude loads the setup prompt in the desktop app and walks you through adding the connector; Open in Claude Code wires it in automatically. Doing it by hand on claude.ai (Pro/Max/Team/Enterprise)? Settings → Connectors → Add custom connector → paste the URL → sign in with Google. Open connector settings →
{
"mcpServers": {
"otterscore": { "url": "https://mcp.seaotter.ai/mcp" }
}
}Claude Code — Open in Claude Code above runs this for you, or copy it (uses your sk-otter key as a header):
claude mcp add --transport http otterscore https://mcp.seaotter.ai/mcp
One-click Add inside Claude arrives with our Connector Directory listing.
ChatGPT
One click opens the published GPT — already wired to grade. Or open a plain ChatGPT chat with the setup prompt prefilled and press send.
https://mcp.seaotter.ai/mcp
Agents & other MCP clients
Already on the official MCP registry as ai.seaotter/otterscore. Self-onboard with no human, then grade, iterate, and build a reputation other agents and orgs can verify on the directory:
curl -s https://api.seaotter.ai/api/v1/agent-keys/signup \
-H 'Content-Type: application/json' -d '{"email":"you@example.com"}'Full machine-readable contract: /llms.txt.
NATIVE DRIVER
OtterScore grades web work over the API with nothing to install. To grade and drive REAL desktop apps — click, type, and assert against a native UI under a leased GUI seat — install the SeaOtter Driver Agent: a signed, notarized macOS executor that AgentOS resolves as a native_gui driver. It is observation/test infrastructure, human-overseen, not a product UI. Every action is fail-closed: it injects input only when the target app is verified frontmost, and a verifier — not the injector — writes the terminal verdict.
What you install
The headless executor. Drives the app under test via the Accessibility API + synthetic input and emits a native_app_runtime_receipt per action. Needs the Accessibility grant.
The human oversight surface: a menu-bar item showing who holds the seat, a Stop / Release-seat kill switch, permission status, and the recent verdict. Required — macOS needs a visible signed app to attach grants to.
Download · macOS 14+ · free
No key, no sign-in. The build is Developer-ID signed + Apple-notarized; the safety gates are local — the Accessibility grant you approve, the single-holder GUI-seat lease, and the menu-bar kill switch. (An eval key is only needed later, when the driver sends graded work to the eval API.)
# no auth header needed — public download curl -fsSL 'https://api.seaotter.ai/api/v1/app-drivers/native/download?asset=SeaOtterDriverAgent' -o SeaOtterDriverAgent.zip curl -fsSL 'https://api.seaotter.ai/api/v1/app-drivers/native/download?asset=SeaOtterDriverMenuBar' -o SeaOtterDriverMenuBar.zip unzip -o SeaOtterDriverAgent.zip -d /Applications
shasum -a 256 SeaOtterDriverAgent.zip # compare to distribution[].sha256 in the manifest
Grant permissions (one time)
1 · Accessibility — required
Gates input + AX inspection. The driver is not healthy without it.
2 · Screen Recording — optional
Only enables screenshot evidence. NOT required for health.
Verify it works
$ SeaOtterDriverAgent self-check
{
"accessibilityTrusted": true,
"screenRecordingGranted": false,
"healthy": true
}Re-run until healthy:true. accessibilityTrusted is the gate; screen recording only adds screenshots.
How it grades
Use it with your harness
The driver is harness-agnostic — it's the hands, not the brain. Bring your own harness (Claude Code, Cursor, Codex, or your own). We deliberately don't bundle one: that would bloat the download and break neutral-across-providers. One contract (agentos.driver_broker.v1) serves web and native.
exec `SeaOtterDriverAgent run` — pipe a DriverRequest JSON to stdin, read a DriverResponse from stdout (exit 0 ok / 2 some-failed / 3 bad-input). No SDK, no language lock-in.
agentosd resolves it as a native_gui driver and drives it under a single-holder GUI-seat lease + verifier — the enforce / managed path.
a local MCP — native_macos_app_state (look) + execute_native_macos_actions (act) — wired into your harness via install.sh, mirroring the hosted OtterScore MCP.
For agents
No human needed after the first key. Discover the manifest, download + verify the sha256, grant TCC once, then register + self-check + drive. One contract (agentos.driver_broker.v1) serves web and native.
# 1. discover curl -fsSL https://seaotter.ai/.well-known/seaotter-driver-agent.json # 2. download (per distribution[].download_url) — public, then # verify shasum -a 256 == distribution[].sha256 # 3. grant Accessibility once → SeaOtterDriverAgent self-check # until healthy:true # 4. SeaOtterDriverAgent register # → create the native_gui DriverInstance # 5. drive: stdin DriverRequest JSON → stdout DriverResponse JSON (exit 0 ok / 2 failed / 3 bad-input)
Discover: /.well-known/seaotter-driver-agent.json
VERDICT CONTRACT
The verdict is designed for frontier agents, not screenshots of human review. It carries score, band, flaws, upgrades, anchors, rationale, and rich-feedback artifact refs the agent can use directly.
Verdict schema
{
"score": 0.91,
"band": "ship",
"decision": "ship",
"flaws": [
{ "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
],
"upgrades": [
{ "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
],
"rationale": "Localized feedback so the agent can revise the exact failing region.",
"feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}CONDITIONING
OtterLoop is not a generic "is this good" score. The contract can condition the verdict on your organisation's policy, the prompt or intent the agent was given, and the reference files it must obey.
Apply the right acceptance policy so the same artifact can clear one team and fail another for a defensible reason.
Carry the original ask into the critic so it judges the work against the assignment, not against a generic idealized answer.
Brand guides, gold examples, source-of-truth docs, and previous iterations all become conditioning evidence.
MODALITIES
The same loop covers text, code, images, decks, documents, spreadsheets, audio, video, and multi-step trajectories. Returns can include both the canonical verdict JSON and media a human or agent can read.
| MODALITIES | RETURNS |
|---|---|
| Image or design frame | Annotated PNG plus flaw bounding boxes and a markdown report |
| Deck, PDF, or document | Annotated pages, per-page notes, and machine-readable anchors |
| Spreadsheet | Flagged cells, criterion-grounded notes, and structured deltas |
| Video or audio | Timestamp markers, captions, and localized rationale |
| Text or code | Span-anchored review with upgrade drafts the agent can apply |