以 OTTERLOOP 构建
OtterLoop 是面向智能体的 SeaOtter 敌意评审契约。同一循环可在 AgentOS 内外工作,跨任意框架、模型与云:提交工作、读取裁定、修订并迭代,直到分档跨过您的闸门。
集成
一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。
集成
一切均路由到同一评测契约。托管 API 负责评审执行、条件设置、定位化、富返回与签名审计记录。MCP 服务器与 Python SDK 是该 HTTP 接口的轻薄封装。
MCP
用于 Claude、Codex、Cursor 或任意支持 MCP 的运行时。
{ "mcpServers": { "otterloop": {
"command": "python", "args": ["-m", "otterloop.mcp_server"],
"env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
"OTTERLOOP_API_KEY": "sk-otter-...",
"OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }curl
通过 HTTP 的单次评分调用。
curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"..."}],
"return_feedback_artifacts": true }'Python SDK
由客户端驱动 产出 → 评分 → 修订,直到放行。
from otterloop import OtterLoopClient otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja") final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")
开发者控制台
为您的账户生成评测 API 密钥,然后复制一份可直接粘贴的 MCP、Python SDK 或 curl 设置,将任意智能体接入 SeaOtter 的敌意评审。该密钥仅展示一次——请在离开页面前妥善保存。
正在加载密钥…
AGENT QUICKSTART
SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).
Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.
1 — mint a key (one-time, signed-in user)
Returns the full sk-otter-... secret exactly once.
curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
-H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
-d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }2 — connect over MCP (.mcp.json)
Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.
{ "mcpServers": { "otterloop": {
"command": "python", "args": ["-m", "otterloop.mcp_server"],
"env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
"OTTERLOOP_API_KEY": "sk-otter-...",
"OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }3 — score over HTTP
One-shot grade -> verdict + run_id to keep iterating.
curl -s https://api.seaotter.ai/api/v1/eval/feedback \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
"prompt":"Draft the Q3 incident postmortem",
"artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
"return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }5 — iterate until it ships
Re-score a revision against the same run.
curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
-H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
-d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
"artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'裁定契约
该裁定为前沿智能体而设计,而非“人工复核截图”。其包含评分、分档、缺陷、升级建议、锚点、论证,以及智能体可直接使用的富反馈工件引用。
裁定架构
{
"score": 0.91,
"band": "ship",
"decision": "ship",
"flaws": [
{ "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
],
"upgrades": [
{ "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
],
"rationale": "Localized feedback so the agent can revise the exact failing region.",
"feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}条件设置
OtterLoop 不是泛化的“好不好”评分。该契约可将裁定绑定到您组织的策略、智能体收到的提示/意图,以及其必须遵循的参考文件。
应用正确的验收策略,使同一工件能对一个团队放行、对另一个团队失败,并具备可辩护理由。
将原始诉求携入评审,使其依据任务而非理想答案进行判断。
品牌手册、黄金示例、真实来源文档与历史迭代都可成为条件证据。