CONSTRUA COM OTTERLOOP

Conecte o crítico a um agente em minutos.

OtterLoop é o contrato voltado ao agente para o crítico hostil da SeaOtter. O mesmo loop funciona com ou sem AgentOS, em qualquer framework, modelo e nuvem: envie o trabalho, leia o veredicto, revise e itere até a faixa liberar seu gate.

MCPHTTPPython SDKArtefatos multimodaisFeedback localizado

INTEGRAÇÃO

Três inícios por copiar e colar. Um contrato.

Tudo roteia para o mesmo contrato de avaliação. A API hospedada é responsável pela execução do crítico, condicionamento, localização, retornos ricos e o registro de auditoria assinado. O servidor MCP e o Python SDK são invólucros leves sobre essa superfície HTTP.

Localize `detail`, `rationale` e `upgrades` por localidade.
Ancore a trechos, células, slides, páginas, quadros ou timestamps.
Busque artefatos renderizados separadamente quando o agente precisar dos bytes de mídia.
Mantenha o pacote de feedback canônico em JSON para automação à prova de fallback.

INTEGRAÇÃO

Três caminhos de entrada.

MCP

Use no Claude, Codex, Cursor ou qualquer runtime compatível com MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

Chamada de pontuação one-shot sobre HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Deixe o cliente conduzir produzir → avaliar → revisar até liberar.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

CONSOLE DO DESENVOLVEDOR

Obtenha sua chave de API de avaliação

Gere uma chave de API de avaliação para sua conta e copie uma configuração pronta de MCP, Python SDK ou curl para conectar qualquer agente ao crítico hostil da SeaOtter. O segredo é exibido uma vez — armazene-o antes de sair desta página.

Carregando chaves…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

CONTRATO DE VEREDICTO

O agente atua sobre um único esquema.

O veredicto é projetado para agentes de fronteira, não capturas de tela de revisão humana. Ele carrega pontuação, faixa, falhas, melhorias, âncoras, racional e referências a artefatos de feedback ricos que o agente pode usar diretamente.

Esquema de veredicto

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

CONDICIONAMENTO

O crítico é condicional à sua barra.

OtterLoop não é um "isso está bom" genérico. O contrato pode condicionar o veredicto à política da sua organização, ao prompt ou intenção fornecida ao agente e aos arquivos de referência que ele deve obedecer.

Política da organização

Aplique a política de aceitação correta para que o mesmo artefato possa liberar para um time e falhar para outro com razão defensável.

Prompt e intenção

Leve a solicitação original ao crítico para que ele julgue o trabalho frente ao pedido, não contra uma resposta idealizada genérica.

Arquivos de referência

Guia de marca, exemplos ouro, documentos fonte da verdade e iterações anteriores se tornam evidências de condicionamento.

Âncoras localizam para bbox, ponto, trecho, célula, slide, página ou timestamp.
A faixa é uma decisão de política em runtime, não prosa de modelo fingindo ser um gate.
Retornos ricos permitem que o mesmo veredicto sirva revisão humana e revisão por máquina.

MODALIDADES

Multimodal de entrada. Multimodal rico de saída.

O mesmo loop cobre texto, código, imagens, decks, documentos, planilhas, áudio, vídeo e trajetórias multi-etapas. Os retornos podem incluir tanto o JSON canônico do veredicto quanto mídias que um humano ou agente pode ler.

Experimentar a demo ao vivo Navegar pelas rubricas

MODALIDADES	RETORNOS
Imagem ou quadro de design	PNG anotado mais caixas delimitadoras de falha e um relatório em markdown
Deck, PDF ou documento	Páginas anotadas, notas por página e âncoras legíveis por máquina
Planilha	Células sinalizadas, notas ancoradas em critérios e deltas estruturados
Vídeo ou áudio	Marcadores de timestamp, legendas e racional localizado
Texto ou código	Revisão ancorada em trechos com rascunhos de melhoria aplicáveis pelo agente

SeaOtterSolicitar acesso

CONSTRUA COM OTTERLOOP

Conecte o crítico a um agente em minutos.

MCPHTTPPython SDKArtefatos multimodaisFeedback localizado

INTEGRAÇÃO

Três inícios por copiar e colar. Um contrato.

Localize `detail`, `rationale` e `upgrades` por localidade.
Ancore a trechos, células, slides, páginas, quadros ou timestamps.
Busque artefatos renderizados separadamente quando o agente precisar dos bytes de mídia.
Mantenha o pacote de feedback canônico em JSON para automação à prova de fallback.

INTEGRAÇÃO

Três caminhos de entrada.

MCP

Use no Claude, Codex, Cursor ou qualquer runtime compatível com MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

Chamada de pontuação one-shot sobre HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

Deixe o cliente conduzir produzir → avaliar → revisar até liberar.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

CONSOLE DO DESENVOLVEDOR

Obtenha sua chave de API de avaliação

Carregando chaves…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

CONTRATO DE VEREDICTO

O agente atua sobre um único esquema.

Esquema de veredicto

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

CONDICIONAMENTO

O crítico é condicional à sua barra.

Política da organização

Aplique a política de aceitação correta para que o mesmo artefato possa liberar para um time e falhar para outro com razão defensável.

Prompt e intenção

Leve a solicitação original ao crítico para que ele julgue o trabalho frente ao pedido, não contra uma resposta idealizada genérica.

Arquivos de referência

Guia de marca, exemplos ouro, documentos fonte da verdade e iterações anteriores se tornam evidências de condicionamento.

Âncoras localizam para bbox, ponto, trecho, célula, slide, página ou timestamp.
A faixa é uma decisão de política em runtime, não prosa de modelo fingindo ser um gate.
Retornos ricos permitem que o mesmo veredicto sirva revisão humana e revisão por máquina.

MODALIDADES

Multimodal de entrada. Multimodal rico de saída.

Experimentar a demo ao vivo Navegar pelas rubricas

MODALIDADES	RETORNOS
Imagem ou quadro de design	PNG anotado mais caixas delimitadoras de falha e um relatório em markdown
Deck, PDF ou documento	Páginas anotadas, notas por página e âncoras legíveis por máquina
Planilha	Células sinalizadas, notas ancoradas em critérios e deltas estruturados
Vídeo ou áudio	Marcadores de timestamp, legendas e racional localizado
Texto ou código	Revisão ancorada em trechos com rascunhos de melhoria aplicáveis pelo agente