以 OTTERLOOP 构建

几分钟内将评审接入智能体。

OtterLoop 是面向智能体的 SeaOtter 敌意评审契约。同一循环可在 AgentOS 内外工作，跨任意框架、模型与云：提交工作、读取裁定、修订并迭代，直到分档跨过您的闸门。

MCPHTTPPython SDK多模态工件定位化反馈

集成

三种复制即用的起步方式。一个契约。

一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。

按本地化设置返回 `detail`、`rationale` 与 `upgrades`。
锚定到跨度、单元格、幻灯、页面、帧或时间戳。
当智能体需要媒体字节时，可单独拉取渲染工件。
将标准反馈包保持为 JSON，以便稳健自动化回退。

集成

三种接入方式。

一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。

MCP

用于 Claude、Codex、Cursor 或任意支持 MCP 的运行时。

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

通过 HTTP 的单次评分调用。

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

由客户端驱动产出 → 评分 → 修订，直到放行。

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

开发者控制台

获取您的评测 API 密钥

为您的账户生成评测 API 密钥，然后复制一份可直接粘贴的 MCP、Python SDK 或 curl 设置，将任意智能体接入 SeaOtter 的敌意评审。该密钥仅展示一次——请在离开页面前妥善保存。

正在加载密钥…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

裁定契约

智能体基于一个架构行动。

该裁定为前沿智能体而设计，而非“人工复核截图”。其包含评分、分档、缺陷、升级建议、锚点、论证，以及智能体可直接使用的富反馈工件引用。

裁定架构

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

条件设置

评审以您的门槛为条件。

OtterLoop 不是泛化的“好不好”评分。该契约可将裁定绑定到您组织的策略、智能体收到的提示/意图，以及其必须遵循的参考文件。

组织策略

应用正确的验收策略，使同一工件能对一个团队放行、对另一个团队失败，并具备可辩护理由。

提示与意图

将原始诉求携入评审，使其依据任务而非理想答案进行判断。

参考文件

品牌手册、黄金示例、真实来源文档与历史迭代都可成为条件证据。

锚点可定位到 bbox、点、跨度、单元格、幻灯、页面或时间戳。
分档是运行时的策略决定，而非装成“闸门”的模型长文。
富返回让同一裁定同时驱动人工复核与机器修订。

模态

多模态输入。丰富多模态输出。

同一循环覆盖文本、代码、图像、演示、文档、电子表格、音频、视频与多步轨迹。返回既包含标准化裁定 JSON，也包含人或智能体可读取的媒体。

试用在线演示浏览量表

模态	返回
图像或设计帧	带标注的 PNG、缺陷边界框与 Markdown 报告
演示、PDF 或文档	带标注页面、逐页注释与机器可读锚点
电子表格	被标记的单元格、基于准则的说明与结构化差异
视频或音频	时间戳标记、字幕与定位化论证
文本或代码	跨度锚定的评审与可直接应用的升级草案

SeaOtter申请访问

以 OTTERLOOP 构建

几分钟内将评审接入智能体。

MCPHTTPPython SDK多模态工件定位化反馈

集成

三种复制即用的起步方式。一个契约。

一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。

按本地化设置返回 `detail`、`rationale` 与 `upgrades`。
锚定到跨度、单元格、幻灯、页面、帧或时间戳。
当智能体需要媒体字节时，可单独拉取渲染工件。
将标准反馈包保持为 JSON，以便稳健自动化回退。

集成

三种接入方式。

一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。

MCP

用于 Claude、Codex、Cursor 或任意支持 MCP 的运行时。

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

通过 HTTP 的单次评分调用。

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

由客户端驱动产出 → 评分 → 修订，直到放行。

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

开发者控制台

获取您的评测 API 密钥

正在加载密钥…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

裁定契约

智能体基于一个架构行动。

该裁定为前沿智能体而设计，而非“人工复核截图”。其包含评分、分档、缺陷、升级建议、锚点、论证，以及智能体可直接使用的富反馈工件引用。

裁定架构

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

条件设置

评审以您的门槛为条件。

OtterLoop 不是泛化的“好不好”评分。该契约可将裁定绑定到您组织的策略、智能体收到的提示/意图，以及其必须遵循的参考文件。

组织策略

应用正确的验收策略，使同一工件能对一个团队放行、对另一个团队失败，并具备可辩护理由。

提示与意图

将原始诉求携入评审，使其依据任务而非理想答案进行判断。

参考文件

品牌手册、黄金示例、真实来源文档与历史迭代都可成为条件证据。

锚点可定位到 bbox、点、跨度、单元格、幻灯、页面或时间戳。
分档是运行时的策略决定，而非装成“闸门”的模型长文。
富返回让同一裁定同时驱动人工复核与机器修订。

模态

多模态输入。丰富多模态输出。

同一循环覆盖文本、代码、图像、演示、文档、电子表格、音频、视频与多步轨迹。返回既包含标准化裁定 JSON，也包含人或智能体可读取的媒体。

试用在线演示浏览量表

模态	返回
图像或设计帧	带标注的 PNG、缺陷边界框与 Markdown 报告
演示、PDF 或文档	带标注页面、逐页注释与机器可读锚点
电子表格	被标记的单元格、基于准则的说明与结构化差异
视频或音频	时间戳标记、字幕与定位化论证
文本或代码	跨度锚定的评审与可直接应用的升级草案