CONSTRUIRE AVEC OTTERLOOP

Reliez le critique à un agent en quelques minutes.

OtterLoop est le contrat orienté agent du critique hostile de SeaOtter. La même boucle fonctionne avec ou sans AgentOS, sur tout framework, modèle et cloud : soumettez le travail, lisez le verdict, révisez et itérez jusqu’à franchir la bande.

MCPHTTPPython SDKArtefacts multimodauxFeedback localisé

INTÉGRATION

Trois démarrages copier-coller. Un contrat.

Tout converge vers le même contrat d’évaluation. L’API hébergée gère l’exécution du critique, le conditionnement, la localisation, les retours riches et l’enregistrement d’audit signé. Le serveur MCP et le Python SDK sont de minces wrappers sur cette surface HTTP.

Localisez `detail`, `rationale` et `upgrades` par locale.
Ancrez sur des plages, cellules, diapos, pages, frames ou timestamps.
Récupérez les artefacts rendus séparément quand l’agent a besoin des octets média.
Conservez le bundle de feedback canonique en JSON pour une automatisation robuste.

INTÉGRATION

Trois voies d’accès.

MCP

Utilisez-le avec Claude, Codex, Cursor ou tout runtime compatible MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

Appel de notation one-shot via HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Laissez le client piloter produire → noter → réviser jusqu’à l’expédition.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

CONSOLE DÉVELOPPEUR

Obtenir votre clé d’API d’évaluation

Frappez une clé d’API d’évaluation pour votre compte, puis copiez une configuration MCP, Python SDK ou curl prête à coller pour relier tout agent au critique hostile de SeaOtter. Le secret est affiché une seule fois — stockez-le avant de quitter cette page.

Chargement des clés…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

CONTRAT DE VERDICT

L’agent agit sur un seul schéma.

Le verdict est conçu pour des agents de pointe, pas des captures d’écran de relecture humaine. Il porte score, bande, défauts, améliorations, repères, raisonnement et références d’artefacts de feedback riches utilisables directement par l’agent.

Schéma de verdict

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

CONDITIONNEMENT

Le critique est conditionné par votre barre.

OtterLoop n’est pas un score générique "est-ce bon". Le contrat peut conditionner le verdict sur la politique de votre organisation, le prompt ou l’intention donnés à l’agent, et les fichiers de référence à respecter.

Politique d’organisation

Appliquez la bonne politique d’acceptation pour que le même artefact passe dans une équipe et échoue dans une autre, pour une raison défendable.

Prompt et intention

Portez la demande initiale au sein du critique afin qu’il juge le travail par rapport à l’assignation, pas à une réponse idéale générique.

Fichiers de référence

Guides de marque, exemples gold, documents de source de vérité et itérations précédentes deviennent des preuves de conditionnement.

Les repères se localisent en bbox, point, plage, cellule, diapo, page ou timestamp.
La bande est une décision de politique à l’exécution, pas du texte de modèle se faisant passer pour une gate.
Les retours riches permettent au même verdict d’alimenter à la fois la relecture humaine et la révision par machine.

MODALITÉS

Entrée multimodale. Sortie multimodale riche.

La même boucle couvre texte, code, images, diaporamas, documents, feuilles de calcul, audio, vidéo et trajectoires multi-étapes. Les retours peuvent inclure le JSON de verdict canonique et des médias lisibles par humain ou agent.

Essayer la démo en direct Parcourir les rubriques

MODALITÉS	RETOURS
Image ou frame de design	PNG annoté plus boîtes englobantes des défauts et rapport markdown
Diaporama, PDF ou document	Pages annotées, notes par page et repères lisibles par machine
Feuille de calcul	Cellules signalées, notes ancrées sur les critères et deltas structurés
Vidéo ou audio	Marqueurs temporels, sous-titres et raisonnement localisé
Texte ou code	Revue ancrée sur plage avec brouillons d’amélioration applicables par l’agent

SeaOtterDemander l’accès

CONSTRUIRE AVEC OTTERLOOP

Reliez le critique à un agent en quelques minutes.

MCPHTTPPython SDKArtefacts multimodauxFeedback localisé

INTÉGRATION

Trois démarrages copier-coller. Un contrat.

Localisez `detail`, `rationale` et `upgrades` par locale.
Ancrez sur des plages, cellules, diapos, pages, frames ou timestamps.
Récupérez les artefacts rendus séparément quand l’agent a besoin des octets média.
Conservez le bundle de feedback canonique en JSON pour une automatisation robuste.

INTÉGRATION

Trois voies d’accès.

MCP

Utilisez-le avec Claude, Codex, Cursor ou tout runtime compatible MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

Appel de notation one-shot via HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Laissez le client piloter produire → noter → réviser jusqu’à l’expédition.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

CONSOLE DÉVELOPPEUR

Obtenir votre clé d’API d’évaluation

Chargement des clés…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

CONTRAT DE VERDICT

L’agent agit sur un seul schéma.

Schéma de verdict

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

CONDITIONNEMENT

Le critique est conditionné par votre barre.

Politique d’organisation

Appliquez la bonne politique d’acceptation pour que le même artefact passe dans une équipe et échoue dans une autre, pour une raison défendable.

Prompt et intention

Portez la demande initiale au sein du critique afin qu’il juge le travail par rapport à l’assignation, pas à une réponse idéale générique.

Fichiers de référence

Guides de marque, exemples gold, documents de source de vérité et itérations précédentes deviennent des preuves de conditionnement.

Les repères se localisent en bbox, point, plage, cellule, diapo, page ou timestamp.
La bande est une décision de politique à l’exécution, pas du texte de modèle se faisant passer pour une gate.
Les retours riches permettent au même verdict d’alimenter à la fois la relecture humaine et la révision par machine.

MODALITÉS

Entrée multimodale. Sortie multimodale riche.

Essayer la démo en direct Parcourir les rubriques

MODALITÉS	RETOURS
Image ou frame de design	PNG annoté plus boîtes englobantes des défauts et rapport markdown
Diaporama, PDF ou document	Pages annotées, notes par page et repères lisibles par machine
Feuille de calcul	Cellules signalées, notes ancrées sur les critères et deltas structurés
Vidéo ou audio	Marqueurs temporels, sous-titres et raisonnement localisé
Texte ou code	Revue ancrée sur plage avec brouillons d’amélioration applicables par l’agent