BUILD CON OTTERLOOP

Colleghi il critico a un agente in pochi minuti.

OtterLoop è il contratto lato agente per il critico ostile di SeaOtter. Lo stesso ciclo funziona on o off AgentOS, su qualsiasi framework, modello e cloud: invii il lavoro, legga il verdetto, riveda e iteri finché la fascia non supera il Suo gate.

MCPHTTPPython SDKArtefatti multimodaliFeedback localizzato

INTEGRAZIONE

Tre avvii copia-e-incolla. Un solo contratto.

Tutto confluisce nello stesso contratto di valutazione. L’API hosted gestisce esecuzione del critico, condizionamento, localizzazione, ritorni ricchi e record di audit firmato. Il server MCP e il Python SDK sono sottili wrapper su quella superficie HTTP.

Localizzi `detail`, `rationale` e `upgrades` per lingua/locale.
Ancoraggi a intervalli, celle, slide, pagine, frame o timestamp.
Recuperi gli artefatti renderizzati separatamente quando all’agente servono i byte dei media.
Tenga il bundle di feedback canonico in JSON per un’automazione resiliente ai fallback.

INTEGRAZIONE

Tre vie di ingresso.

MCP

Usi in Claude, Codex, Cursor o in qualsiasi runtime che parli MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

Chiamata di punteggio one-shot via HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Lasci che il client guidi produci → valuta → rivedi fino al rilascio.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

CONSOLE SVILUPPATORI

Ottieni la tua eval API key

Generi una chiave API di valutazione per il Suo account, poi copi una configurazione pronta per MCP, Python SDK o curl per collegare qualsiasi agente al critico ostile di SeaOtter. Il segreto viene mostrato una sola volta — lo salvi prima di lasciare la pagina.

Caricamento chiavi…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

CONTRATTO DI VERDETTO

L’agente agisce su uno schema unico.

Il verdetto è progettato per agenti all’avanguardia, non screenshot di revisione umana. Trasporta punteggio, fascia, difetti, miglioramenti, ancore, motivazione e riferimenti a artefatti di feedback ricchi che l’agente può usare direttamente.

Schema del verdetto

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

CONDIZIONAMENTO

Il critico è condizionato sulla Sua asticella.

OtterLoop non è un generico "è buono". Il contratto può condizionare il verdetto sulla policy della Sua organizzazione, sul prompt o intento dato all’agente e sui file di riferimento che deve rispettare.

Policy dell’organizzazione

Applichi la policy di accettazione corretta così lo stesso artefatto può superare un team e fallire per un altro con una motivazione difendibile.

Prompt e intento

Porti la richiesta originale nel critico così giudica il lavoro rispetto all’assegnazione, non rispetto a una risposta idealizzata generica.

File di riferimento

Brand guide, esempi d’oro, documenti fonte di verità e iterazioni precedenti diventano tutte evidenze di condizionamento.

Le ancore localizzano a bbox, punto, intervallo, cella, slide, pagina o timestamp.
La fascia è una decisione di policy a runtime, non prosa di modello che finge di essere un gate.
I ritorni ricchi consentono allo stesso verdetto di guidare sia la revisione umana sia la revisione automatica.

MODALITÀ

Multimodale in. Ricco multimodale out.

Lo stesso ciclo copre testo, codice, immagini, presentazioni, documenti, fogli di calcolo, audio, video e traiettorie multi-step. I ritorni possono includere sia il JSON canonico del verdetto sia media leggibili da umani o agenti.

Prova la demo live Sfoglia le rubriche

MODALITÀ	RESTITUZIONI
Immagine o frame di design	PNG annotato più bounding box dei difetti e un report markdown
Presentazione, PDF o documento	Pagine annotate, note per pagina e ancore machine-readable
Foglio di calcolo	Celle segnalate, note ancorate ai criteri e delta strutturati
Video o audio	Marcatori di timestamp, didascalie e motivazioni localizzate
Testo o codice	Revisione ancorata a intervalli con bozze di miglioramento applicabili dall’agente

SeaOtterRichiedi accesso

BUILD CON OTTERLOOP

Colleghi il critico a un agente in pochi minuti.

MCPHTTPPython SDKArtefatti multimodaliFeedback localizzato

INTEGRAZIONE

Tre avvii copia-e-incolla. Un solo contratto.

Localizzi `detail`, `rationale` e `upgrades` per lingua/locale.
Ancoraggi a intervalli, celle, slide, pagine, frame o timestamp.
Recuperi gli artefatti renderizzati separatamente quando all’agente servono i byte dei media.
Tenga il bundle di feedback canonico in JSON per un’automazione resiliente ai fallback.

INTEGRAZIONE

Tre vie di ingresso.

MCP

Usi in Claude, Codex, Cursor o in qualsiasi runtime che parli MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

Chiamata di punteggio one-shot via HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Lasci che il client guidi produci → valuta → rivedi fino al rilascio.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

CONSOLE SVILUPPATORI

Ottieni la tua eval API key

Caricamento chiavi…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

CONTRATTO DI VERDETTO

L’agente agisce su uno schema unico.

Schema del verdetto

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

CONDIZIONAMENTO

Il critico è condizionato sulla Sua asticella.

Policy dell’organizzazione

Applichi la policy di accettazione corretta così lo stesso artefatto può superare un team e fallire per un altro con una motivazione difendibile.

Prompt e intento

Porti la richiesta originale nel critico così giudica il lavoro rispetto all’assegnazione, non rispetto a una risposta idealizzata generica.

File di riferimento

Brand guide, esempi d’oro, documenti fonte di verità e iterazioni precedenti diventano tutte evidenze di condizionamento.

Le ancore localizzano a bbox, punto, intervallo, cella, slide, pagina o timestamp.
La fascia è una decisione di policy a runtime, non prosa di modello che finge di essere un gate.
I ritorni ricchi consentono allo stesso verdetto di guidare sia la revisione umana sia la revisione automatica.

MODALITÀ

Multimodale in. Ricco multimodale out.

Prova la demo live Sfoglia le rubriche

MODALITÀ	RESTITUZIONI
Immagine o frame di design	PNG annotato più bounding box dei difetti e un report markdown
Presentazione, PDF o documento	Pagine annotate, note per pagina e ancore machine-readable
Foglio di calcolo	Celle segnalate, note ancorate ai criteri e delta strutturati
Video o audio	Marcatori di timestamp, didascalie e motivazioni localizzate
Testo o codice	Revisione ancorata a intervalli con bozze di miglioramento applicabili dall’agente