# SeaOtter — the acceptance layer for enterprise agent work > SeaOtter grades every artifact your agents produce against your acceptance policy and gates it before it reaches production. **OtterScore** is a hostile-by-default, adversarially-aligned critic that scores work (code, text, documents, decks, spreadsheets, images, video) and its trajectory on one published band — ship / route to fix / quarantine / block. SeaOtter is **agent-native**: an agent can discover the API, get a key, connect over MCP or HTTP, score work, read the flaws, and iterate with the critic until the work passes the gate. This file is the machine-readable entry point — read it first. This page is for AI agents and automated clients. The whole thesis is agents iterating with the critic at scale, so agent self-onboarding is first-class. The fastest path is the OtterLoop loop: get a key → connect (MCP or HTTP) → score → read flaws → iterate → optionally workflow-score / benchmark. Public bases: - Web: https://seaotter.ai (prod), https://dev.seaotter.ai (dev) - API: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev) - Auth header for every eval call: `Authorization: Bearer ` (or `X-OtterBench-Key: `) ## What SeaOtter is - [OtterScore — the readiness evaluator](https://seaotter.ai/critics): a hostile-by-default critic aligned to find reasons to block, not to approve. It grades each artifact and its trajectory against your acceptance policy and returns a score (0–100, lower = more flawed), a band (ship / route_to_fix / quarantine / block), located flaws, and concrete upgrades. - [Rubrics — the acceptance criteria](https://seaotter.ai/rubrics): per-modality, versioned criteria + weights the critic grades against. Browse, fork, and preview them. - [Live demo — paste work, see the critic push back](https://seaotter.ai/demo/eval): the loop in the browser. - [Developer / agent onboarding](https://seaotter.ai/developers): get a key, MCP / SDK / curl quickstart, the verdict schema. ## Agent quickstart (the loop) The exact loop, in order. Steps 3–6 are pure HTTP (or MCP tools) and need only the key. 1. **Get a key — two ways.** (a) *Fully autonomous, no human:* `POST https://api.seaotter.ai/api/v1/agent-keys/signup` with `{ "email": "", "org_name": "" }` — creates a free-tier account and returns your `sk-otter-<40 hex>` secret (shown once) plus your `free_quota`. (b) *Human mint:* a signed-in org user mints a key at https://seaotter.ai/developers (`POST /api/v1/agent-keys`). Either way, use the secret as your bearer token. 2. **Connect.** Drop the hosted MCP server below into an MCP-speaking runtime (Claude / Codex / Cursor) — connect by URL, no install — or call the HTTP API directly. 3. **Score.** Send the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back a verdict. 4. **Read the flaws.** Each flaw has `criterion`, `severity`, `evidence`, `detail`, and an `anchor` (where: bbox / timestamp / cell / slide / page / span). `upgrades[]` are concrete fixes. 5. **Iterate.** Revise the work against the flaws and re-score (`POST /api/v1/eval/runs/{id}/iterate`) until `band` clears your gate (e.g. `ship`). 6. **Workflow / benchmark.** Score an end-to-end multi-step workflow topology with `POST /api/v1/eval/workflows/{id}/topology` for a composite + per-step + chain critique. 7. **Pay when the free quota runs out.** After `free_quota` grades the eval API returns `HTTP 402` with a `checkout_url` — a Stripe Checkout link your owner opens to add a payment method (or call `POST /api/v1/billing/pay-link` any time to fetch it; `GET /api/v1/billing/status` shows remaining free + billing state). Once paid, usage is metered and you keep grading. ## MCP OtterScore is a **hosted MCP server** — connect by URL, no install, no package. The whole loop is exposed as read-only tools (no side effects → auto-approved in non-interactive runs). - `.mcp.json` (Claude / Cursor) or `config.toml` `[mcp_servers.otterscore]` (Codex): ```json { "mcpServers": { "otterscore": { "url": "https://mcp.seaotter.ai/mcp", "headers": { "Authorization": "Bearer sk-otter-..." } } } } ``` - Tools the agent gets: `otter_list_policies`, `otter_score`, `otter_iterate`, `otter_score_async`, `otter_job_result`, `otter_score_stream`, `otter_score_workflow`, `otter_get_feedback_artifact`. - For a slow/large grade (or `mode="agentic"` deep grading), prefer the non-blocking pair: `otter_score_async` returns a `job_id` immediately, then poll `otter_job_result(job_id)` until `status="completed"` — so the call never blocks or times out while the critic grades. `otter_score` stays the simple one-shot for quick grades. - Your `sk-otter-...` key authenticates every call and bills your tenant — get one free at https://seaotter.ai/developers or `POST /api/v1/agent-keys/signup`. (Transport: MCP Streamable HTTP, stateless.) ## HTTP API Every eval call carries `Authorization: Bearer ` and `Content-Type: application/json`. Base: `https://api.seaotter.ai` (prod) / `https://dev-api.seaotter.ai` (dev). - `GET /api/v1/eval/policies` — org acceptance policies you can condition grading on. - `GET /api/v1/eval/rubrics` — list rubrics (acceptance criteria); `GET /api/v1/eval/rubrics/{id}` for one. - `POST /api/v1/eval/feedback` — one-shot grade → flat verdict + `run_id` to keep iterating (the OtterLoop convenience entry). - `POST /api/v1/eval/runs` — create a run + first verdict (lower-level: full conditioning slots). - `POST /api/v1/eval/runs/{id}/iterate` — submit a revision, get the next verdict. - `GET /api/v1/eval/runs/{id}` / `GET /api/v1/eval/runs/{id}/score` — fetch a run / its latest score. - `POST /api/v1/eval/workflows/{id}/topology` — score an end-to-end workflow graph (composite + per-step + chain critique). - `POST /api/v1/eval/feedback` returns rich feedback artifacts when `return_feedback_artifacts: true`; fetch one with `GET /api/v1/eval/feedback-artifacts/{ref}`. - `GET/POST /api/v1/agent-keys` — list / mint eval keys (requires a signed-in org user, not an eval key). One-shot score over HTTP: ```bash curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \ -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \ -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"en", "prompt":"Draft the Q3 incident postmortem", "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}], "return_feedback_artifacts": true }' ``` The response carries `run_id` and a `verdict` (`score`, `band`, `flaws[]`, `upgrades[]`). Use the `run_id` to iterate. ## Get a key - [Developer / agent console](https://seaotter.ai/developers): a signed-in org user creates an account (https://seaotter.ai/signup), then mints a key (`POST /api/v1/agent-keys`, body `{ "name": "my-agent" }`). The full `sk-otter-...` secret is returned once. Hand it to your agent as `OTTERLOOP_API_KEY` / the bearer token. - Today an org mints the key once via the console; the agent then uses it for every step above. No-human-step agent self-signup is a documented follow-up: /docs/agent-native.md. ## API reference - [OpenAPI spec (machine-readable)](https://api.seaotter.ai/api/v1/openapi.json): full schemas for the eval, agent-key, rubric, and policy routes. - [Interactive API docs](https://api.seaotter.ai/api/v1/docs): Swagger UI. - [Critics catalog](https://seaotter.ai/critics) · [Rubric library](https://seaotter.ai/rubrics) · [Live demo](https://seaotter.ai/demo/eval) ## Optional - [Agent-native contract (for maintainers)](https://seaotter.ai/docs/agent-native): discovery → register → key → MCP/HTTP → score/iterate/workflow/benchmark, and the known self-signup follow-up. - [Python SDK (otterloop)](https://pypi.org/project/otterloop/): `OtterLoopClient` wraps the same HTTP surface; `otter.loop(produce=..., work=..., target_band="ship")` drives produce → grade → revise.