BUILD MIT OTTERLOOP

Verdrahten Sie den Kritiker in Minuten in einen Agenten.

OtterLoop ist der agentenseitige Vertrag für den adversariellen Kritiker von SeaOtter. Dieselbe Schleife funktioniert auf oder neben AgentOS, über jedes Framework, Modell und jede Cloud: Arbeit einreichen, Urteil lesen, überarbeiten und iterieren, bis das Band Ihr Gate passiert.

MCPHTTPPython SDKMultimodale ArtefakteLokalisiertes Feedback

INTEGRATION

Drei Copy‑Paste‑Starts. Ein Vertrag.

Alles routet auf denselben Eval-Vertrag. Die gehostete API verantwortet Kritiker‑Ausführung, Konditionierung, Lokalisierung, reiche Rückgaben und den signierten Audit‑Datensatz. Der MCP‑Server und das Python SDK sind dünne Wrapper über diese HTTP‑Oberfläche.

Lokalisieren Sie `detail`, `rationale` und `upgrades` nach Locale.
Verankern Sie auf Spans, Zellen, Folien, Seiten, Frames oder Zeitstempel.
Rufen Sie gerenderte Artefakte separat ab, wenn der Agent Medienbytes benötigt.
Behalten Sie das kanonische Feedback‑Bundle in JSON für ausfallsichere Automatisierung.

INTEGRATION

Drei Wege hinein.

MCP

Verwenden Sie es in Claude, Codex, Cursor oder jeder MCP‑fähigen Runtime.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

One‑Shot‑Score‑Call über HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Lassen Sie den Client produce → grade → revise bis „ship“ treiben.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

ENTWICKLERKONSOLE

Ihren Eval‑API‑Schlüssel erhalten

Erstellen Sie einen Eval‑API‑Schlüssel für Ihr Konto und kopieren Sie anschließend eine vorkonfigurierte MCP‑, Python‑SDK‑ oder curl‑Einrichtung, um jeden Agenten mit SeaOtters adversarialem Kritiker zu verbinden. Das Geheimnis wird einmalig angezeigt — speichern Sie es, bevor Sie diese Seite verlassen.

Lade Schlüssel…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

VERDIKT-VERTRAG

Der Agent agiert auf einem Schema.

Das Verdikt ist für Frontier‑Agenten entworfen, nicht für Screenshots menschlicher Reviews. Es trägt Score, Band, Mängel, Upgrades, Anker, Begründung und Referenzen auf reichhaltige Feedback‑Artefakte, die der Agent direkt nutzen kann.

Verdikt‑Schema

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

KONDITIONIERUNG

Der Kritiker ist auf Ihre Schwelle konditioniert.

OtterLoop ist kein generisches „ist das gut“-Rating. Der Vertrag kann das Verdikt auf die Policy Ihrer Organisation, den Prompt bzw. die dem Agenten gegebene Intention und die zu befolgenden Referenzdateien konditionieren.

Policy der Organisation

Wenden Sie die richtige Acceptance-Policy an, sodass dasselbe Artefakt ein Team passiert und bei einem anderen aus vertretbaren Gründen scheitert.

Prompt und Intention

Tragen Sie die ursprüngliche Aufgabenstellung in den Kritiker, damit er die Arbeit gegen den Auftrag beurteilt, nicht gegen eine idealisierte generische Antwort.

Referenzdateien

Brand‑Guides, Gold‑Beispiele, Single‑Source‑Docs und frühere Iterationen werden zu konditionierenden Evidenzen.

Anker lokalisieren auf bbox, Punkt, Span, Zelle, Folie, Seite oder Zeitstempel.
Das Band ist eine Laufzeit‑Policy‑Entscheidung, kein Modellprosa‑Ersatz für ein Gate.
Reiche Rückgaben lassen dasselbe Verdikt sowohl menschliche Prüfung als auch maschinelle Überarbeitung treiben.

MODALITÄTEN

Multimodal rein. Reich multimodal raus.

Dieselbe Schleife deckt Text, Code, Bilder, Decks, Dokumente, Tabellenkalkulationen, Audio, Video und mehrstufige Trajektorien ab. Rückgaben können sowohl das kanonische Verdikt‑JSON als auch Medien enthalten, die ein Mensch oder Agent lesen kann.

Live-Demo ausprobieren Rubriken durchsuchen

MODALITÄTEN	RÜCKGABEN
Bild oder Design‑Frame	Annotiertes PNG plus Bounding‑Boxes der Mängel und ein Markdown‑Bericht
Deck, PDF oder Dokument	Annotierte Seiten, Notizen je Seite und maschinenlesbare Anker
Tabellenkalkulation	Markierte Zellen, kriteriumsbasierte Notizen und strukturierte Deltas
Video oder Audio	Zeitstempel‑Marker, Untertitel und lokalisierte Begründung
Text oder Code	Span‑verankertes Review mit Upgrade‑Entwürfen, die der Agent anwenden kann

SeaOtterZugang anfordern

BUILD MIT OTTERLOOP

Verdrahten Sie den Kritiker in Minuten in einen Agenten.

MCPHTTPPython SDKMultimodale ArtefakteLokalisiertes Feedback

INTEGRATION

Drei Copy‑Paste‑Starts. Ein Vertrag.

Lokalisieren Sie `detail`, `rationale` und `upgrades` nach Locale.
Verankern Sie auf Spans, Zellen, Folien, Seiten, Frames oder Zeitstempel.
Rufen Sie gerenderte Artefakte separat ab, wenn der Agent Medienbytes benötigt.
Behalten Sie das kanonische Feedback‑Bundle in JSON für ausfallsichere Automatisierung.

INTEGRATION

Drei Wege hinein.

MCP

Verwenden Sie es in Claude, Codex, Cursor oder jeder MCP‑fähigen Runtime.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

One‑Shot‑Score‑Call über HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Lassen Sie den Client produce → grade → revise bis „ship“ treiben.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

ENTWICKLERKONSOLE

Ihren Eval‑API‑Schlüssel erhalten

Lade Schlüssel…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

VERDIKT-VERTRAG

Der Agent agiert auf einem Schema.

Verdikt‑Schema

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

KONDITIONIERUNG

Der Kritiker ist auf Ihre Schwelle konditioniert.

Policy der Organisation

Wenden Sie die richtige Acceptance-Policy an, sodass dasselbe Artefakt ein Team passiert und bei einem anderen aus vertretbaren Gründen scheitert.

Prompt und Intention

Tragen Sie die ursprüngliche Aufgabenstellung in den Kritiker, damit er die Arbeit gegen den Auftrag beurteilt, nicht gegen eine idealisierte generische Antwort.

Referenzdateien

Brand‑Guides, Gold‑Beispiele, Single‑Source‑Docs und frühere Iterationen werden zu konditionierenden Evidenzen.

Anker lokalisieren auf bbox, Punkt, Span, Zelle, Folie, Seite oder Zeitstempel.
Das Band ist eine Laufzeit‑Policy‑Entscheidung, kein Modellprosa‑Ersatz für ein Gate.
Reiche Rückgaben lassen dasselbe Verdikt sowohl menschliche Prüfung als auch maschinelle Überarbeitung treiben.

MODALITÄTEN

Multimodal rein. Reich multimodal raus.

Live-Demo ausprobieren Rubriken durchsuchen

MODALITÄTEN	RÜCKGABEN
Bild oder Design‑Frame	Annotiertes PNG plus Bounding‑Boxes der Mängel und ein Markdown‑Bericht
Deck, PDF oder Dokument	Annotierte Seiten, Notizen je Seite und maschinenlesbare Anker
Tabellenkalkulation	Markierte Zellen, kriteriumsbasierte Notizen und strukturierte Deltas
Video oder Audio	Zeitstempel‑Marker, Untertitel und lokalisierte Begründung
Text oder Code	Span‑verankertes Review mit Upgrade‑Entwürfen, die der Agent anwenden kann