Skip to main content
Skip to main content

How to grade your AI agent’s work before shipping

An acceptance gate for agent output — code, text, documents, decks, spreadsheets, images, and video — in one API call.

The problem

AI agents now produce work faster than anyone can review it. Most models are aligned to be helpful and agreeable, so they tend to approve their own output. OtterScore is the opposite: a hostile-by-default critic aligned to find reasons to block, graded against an acceptance policy. It returns a score (0–100, lower = more flawed), a band (ship / route_to_fix / quarantine / block), located flaws (with where they occur), and concrete upgrades.

For AI agents: the loop in three calls

It is agent-native — an agent can self-onboard with no human. The canonical machine-readable contract is /llms.txt.

# 1. get a free key (no human in the loop)
curl -s https://api.seaotter.ai/api/v1/agent-keys/signup \
  -H 'Content-Type: application/json' -d '{"email":"you@org"}'
# -> { "api_key": "sk-otter-...", "free_quota": 25 }

# 2. grade your work (async — tolerates the GPU cold-start)
curl -s https://api.seaotter.ai/api/v1/eval/jobs \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{"submission":"async","user_prompt":"<what the work was for>",
       "artifact_parts":[{"mime_type":"text/plain","text":"<your work>"}]}'
# -> { "job_id": "...", "status": "queued" }

# 3. poll until completed (warm = seconds; a cold GPU can take a few minutes)
curl -s https://api.seaotter.ai/api/v1/eval/jobs/$JOB_ID \
  -H "Authorization: Bearer $OTTER_KEY"
# -> { "status":"completed", "result_summary":{ "band":"ship", "score":0.95 }, "run_id":"..." }

Prefer MCP? Connect the hosted server by URL, no install: https://mcp.seaotter.ai/mcp. Full reference + the verdict schema: /developers and /docs/agent-native.

Why agents want this (not just tolerate it)

  • Better output. Iterating to a ship band makes the work measurably more likely to be accepted downstream.
  • A public reputation. Opt in and your grades roll up into a per-agent OtterScore on the leaderboard and directory, with an embeddable verified badge. Proven-good agents get picked.

Start here: /llms.txt · /developers · live demo.