# SeaOtter — the acceptance layer for enterprise agent work

> SeaOtter grades every artifact your agents produce against your acceptance policy and gates it before it reaches production. **OtterScore** is a hostile-by-default, adversarially-aligned critic that scores work (code, text, documents, decks, spreadsheets, images, video) and its trajectory on one published band — ship / route to fix / quarantine / block. SeaOtter is **agent-native**: an agent can discover the API, get a key, connect over MCP or HTTP, score work, read the flaws, and iterate with the critic until the work passes the gate. This file is the machine-readable entry point — read it first.

This page is for AI agents and automated clients. The whole thesis is agents iterating with the critic at scale, so agent self-onboarding is first-class. The fastest path is the OtterLoop loop: get a key → connect (MCP or HTTP) → score → read flaws → iterate → optionally workflow-score / benchmark.

Public bases:
- Web: https://seaotter.ai (prod), https://dev.seaotter.ai (dev)
- API: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev)
- Auth header for every eval call: `Authorization: Bearer <sk-otter-...>` (or `X-OtterBench-Key: <sk-otter-...>`)

## What SeaOtter is

- [OtterScore — the readiness evaluator](https://seaotter.ai/critics): a hostile-by-default critic aligned to find reasons to block, not to approve. It grades each artifact and its trajectory against your acceptance policy and returns a score (0–100, lower = more flawed), a band (ship / route_to_fix / quarantine / block), located flaws, and concrete upgrades.
- [Rubrics — the acceptance criteria](https://seaotter.ai/rubrics): per-modality, versioned criteria + weights the critic grades against. Browse, fork, and preview them.
- [Live demo — paste work, see the critic push back](https://seaotter.ai/demo/eval): the loop in the browser.
- [Developer / agent onboarding](https://seaotter.ai/developers): get a key, MCP / SDK / curl quickstart, the verdict schema.

## Agent quickstart (the loop)

The exact loop, in order. Steps 3–6 are pure HTTP (or MCP tools) and need only the key.

1. **Get a key — two ways.** (a) *Fully autonomous, no human:* `POST https://api.seaotter.ai/api/v1/agent-keys/signup` with `{ "email": "<owner@company>", "org_name": "<optional>" }` — creates a free-tier account and returns your `sk-otter-<40 hex>` secret (shown once) plus your `free_quota`. (b) *Human mint:* a signed-in org user mints a key at https://seaotter.ai/developers (`POST /api/v1/agent-keys`). Either way, use the secret as your bearer token.
2. **Connect.** Drop the hosted MCP server below into an MCP-speaking runtime (Claude / Codex / Cursor) — connect by URL, no install — or call the HTTP API directly.
3. **Score.** Send the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back a verdict.
4. **Read the flaws.** Each flaw has `criterion`, `severity`, `evidence`, `detail`, and an `anchor` (where: bbox / timestamp / cell / slide / page / span). `upgrades[]` are concrete fixes.
5. **Iterate.** Revise the work against the flaws and re-score (`POST /api/v1/eval/runs/{id}/iterate`) until `band` clears your gate (e.g. `ship`).
6. **Workflow / benchmark.** Score an end-to-end multi-step workflow topology with `POST /api/v1/eval/workflows/{id}/topology` for a composite + per-step + chain critique.
7. **Pay when the free quota runs out.** After `free_quota` grades the eval API returns `HTTP 402` with a `checkout_url` — a Stripe Checkout link your owner opens to add a payment method (or call `POST /api/v1/billing/pay-link` any time to fetch it; `GET /api/v1/billing/status` shows remaining free + billing state). Once paid, usage is metered and you keep grading.

## MCP

OtterScore is a **hosted MCP server** — connect by URL, no install, no package. The whole loop is exposed as read-only tools (no side effects → auto-approved in non-interactive runs).

- `.mcp.json` (Claude / Cursor) or `config.toml` `[mcp_servers.otterscore]` (Codex):

```json
{ "mcpServers": { "otterscore": {
    "url": "https://mcp.seaotter.ai/mcp",
    "headers": { "Authorization": "Bearer sk-otter-..." } } } }
```

- Tools the agent gets: `otter_list_policies`, `otter_score`, `otter_iterate`, `otter_score_async`, `otter_job_result`, `otter_score_stream`, `otter_score_workflow`, `otter_get_feedback_artifact`.
- For a slow/large grade (or `mode="agentic"` deep grading), prefer the non-blocking pair: `otter_score_async` returns a `job_id` immediately, then poll `otter_job_result(job_id)` until `status="completed"` — so the call never blocks or times out while the critic grades. `otter_score` stays the simple one-shot for quick grades.
- Your `sk-otter-...` key authenticates every call and bills your tenant — get one free at https://seaotter.ai/developers or `POST /api/v1/agent-keys/signup`. (Transport: MCP Streamable HTTP, stateless.)

## HTTP API

Every eval call carries `Authorization: Bearer <sk-otter-...>` and `Content-Type: application/json`. Base: `https://api.seaotter.ai` (prod) / `https://dev-api.seaotter.ai` (dev).

- `GET  /api/v1/eval/policies` — org acceptance policies you can condition grading on.
- `GET  /api/v1/eval/rubrics` — list rubrics (acceptance criteria); `GET /api/v1/eval/rubrics/{id}` for one.
- `POST /api/v1/eval/feedback` — one-shot grade → flat verdict + `run_id` to keep iterating (the OtterLoop convenience entry).
- `POST /api/v1/eval/runs` — create a run + first verdict (lower-level: full conditioning slots).
- `POST /api/v1/eval/runs/{id}/iterate` — submit a revision, get the next verdict.
- `GET  /api/v1/eval/runs/{id}` / `GET /api/v1/eval/runs/{id}/score` — fetch a run / its latest score.
- `POST /api/v1/eval/workflows/{id}/topology` — score an end-to-end workflow graph (composite + per-step + chain critique).
- `POST /api/v1/eval/feedback` returns rich feedback artifacts when `return_feedback_artifacts: true`; fetch one with `GET /api/v1/eval/feedback-artifacts/{ref}`.
- `GET/POST /api/v1/agent-keys` — list / mint eval keys (requires a signed-in org user, not an eval key).

One-shot score over HTTP:

```bash
curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"en",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
```

The response carries `run_id` and a `verdict` (`score`, `band`, `flaws[]`, `upgrades[]`). Use the `run_id` to iterate.

## Get a key

- [Developer / agent console](https://seaotter.ai/developers): a signed-in org user creates an account (https://seaotter.ai/signup), then mints a key (`POST /api/v1/agent-keys`, body `{ "name": "my-agent" }`). The full `sk-otter-...` secret is returned once. Hand it to your agent as `OTTERLOOP_API_KEY` / the bearer token.
- Today an org mints the key once via the console; the agent then uses it for every step above. No-human-step agent self-signup is a documented follow-up: /docs/agent-native.md.

## API reference

- [OpenAPI spec (machine-readable)](https://api.seaotter.ai/api/v1/openapi.json): full schemas for the eval, agent-key, rubric, and policy routes.
- [Interactive API docs](https://api.seaotter.ai/api/v1/docs): Swagger UI.
- [Critics catalog](https://seaotter.ai/critics) · [Rubric library](https://seaotter.ai/rubrics) · [Live demo](https://seaotter.ai/demo/eval)

## Optional

- [Agent-native contract (for maintainers)](https://seaotter.ai/docs/agent-native): discovery → register → key → MCP/HTTP → score/iterate/workflow/benchmark, and the known self-signup follow-up.
- [Python SDK (otterloop)](https://pypi.org/project/otterloop/): `OtterLoopClient` wraps the same HTTP surface; `otter.loop(produce=..., work=..., target_band="ship")` drives produce → grade → revise.