Skip to main content
跳转到主要内容
SeaOtter
首页提交构建在线演示评审量表
申请访问

以 OTTERLOOP 构建

几分钟内将评审接入智能体。

OtterLoop 是面向智能体的 SeaOtter 敌意评审契约。同一循环可在 AgentOS 内外工作,跨任意框架、模型与云:提交工作、读取裁定、修订并迭代,直到分档跨过您的闸门。

MCPHTTPPython SDK多模态工件定位化反馈

集成

三种复制即用的起步方式。一个契约。

一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。

  • 按本地化设置返回 `detail`、`rationale` 与 `upgrades`。
  • 锚定到跨度、单元格、幻灯、页面、帧或时间戳。
  • 当智能体需要媒体字节时,可单独拉取渲染工件。
  • 将标准反馈包保持为 JSON,以便稳健自动化回退。

集成

三种接入方式。

一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。

MCP

用于 Claude、Codex、Cursor 或任意支持 MCP 的运行时。

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

通过 HTTP 的单次评分调用。

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

由客户端驱动 产出 → 评分 → 修订,直到放行。

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

开发者控制台

获取您的评测 API 密钥

为您的账户生成评测 API 密钥,然后复制一份可直接粘贴的 MCP、Python SDK 或 curl 设置,将任意智能体接入 SeaOtter 的敌意评审。该密钥仅展示一次——请在离开页面前妥善保存。

正在加载密钥…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

  1. 1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
  2. 2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
  3. 3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
  4. 4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
  5. 5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
  6. 6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

裁定契约

智能体基于一个架构行动。

该裁定为前沿智能体而设计,而非“人工复核截图”。其包含评分、分档、缺陷、升级建议、锚点、论证,以及智能体可直接使用的富反馈工件引用。

裁定架构

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

条件设置

评审以您的门槛为条件。

OtterLoop 不是泛化的“好不好”评分。该契约可将裁定绑定到您组织的策略、智能体收到的提示/意图,以及其必须遵循的参考文件。

组织策略

应用正确的验收策略,使同一工件能对一个团队放行、对另一个团队失败,并具备可辩护理由。

提示与意图

将原始诉求携入评审,使其依据任务而非理想答案进行判断。

参考文件

品牌手册、黄金示例、真实来源文档与历史迭代都可成为条件证据。

  • 锚点可定位到 bbox、点、跨度、单元格、幻灯、页面或时间戳。
  • 分档是运行时的策略决定,而非装成“闸门”的模型长文。
  • 富返回让同一裁定同时驱动人工复核与机器修订。

模态

多模态输入。丰富多模态输出。

同一循环覆盖文本、代码、图像、演示、文档、电子表格、音频、视频与多步轨迹。返回既包含标准化裁定 JSON,也包含人或智能体可读取的媒体。

试用在线演示浏览量表
模态返回
图像或设计帧带标注的 PNG、缺陷边界框与 Markdown 报告
演示、PDF 或文档带标注页面、逐页注释与机器可读锚点
电子表格被标记的单元格、基于准则的说明与结构化差异
视频或音频时间戳标记、字幕与定位化论证
文本或代码跨度锚定的评审与可直接应用的升级草案
SeaOtter面向企业智能体工作的验收层。
提交构建在线演示评审量表

© 2026 SeaOtter。面向企业智能体工作的验收层。