OTTERLOOP でビルド

数分で批評家をエージェントに配線。

OtterLoop は SeaOtter の敵対的批評家に対するエージェント向け契約です。同じループが AgentOS の有無を問わず、あらゆるフレームワーク・モデル・クラウドで機能します。成果物を提出し、判定を読み、修正し、ゲートを通過するまで反復します。

MCPHTTPPython SDKマルチモーダル・アーティファクトローカライズ済みフィードバック

統合

コピペ 3 パターン。同一契約。

すべてが同じ評価契約にルーティングされます。ホスト API が批評家の実行・条件付け・位置特定・リッチな返却・署名済み監査記録を担います。MCP サーバーと Python SDK はその HTTP サーフェスの薄いラッパーです。

`detail`、`rationale`、`upgrades` をロケール別にローカライズ。
スパン・セル・スライド・ページ・フレーム・タイムスタンプにアンカー。
メディアバイトが必要な場合、描画済みアーティファクトを別途取得。
正規のフィードバックバンドルを JSON に保持し、自動化のフォールバックを堅牢化。

統合

3 つの導入経路。

MCP

Claude、Codex、Cursor、または任意の MCP 対応ランタイムで利用。

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

HTTP 経由のワンショット採点コール。

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

クライアントに produce → grade → revise の駆動を任せ、ship まで。

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

開発者コンソール

評価 API キーを取得

アカウント用の評価 API キーを発行し、あらかじめ用意された MCP、Python SDK、または curl の設定をコピーして、任意のエージェントを SeaOtter の敵対的批評家へ接続します。シークレットの表示は 1 回限りです—離れる前に保管してください。

キーを読み込み中…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

判定契約

エージェントは 1 つのスキーマで行動。

判定は人間レビューのスクリーンショットではなく、先端のエージェント向けに設計されています。スコア、バンド、欠陥、アップグレード、アンカー、根拠、そしてエージェントが直接利用できるリッチ・フィードバックのアーティファクト参照を保持します。

判定スキーマ

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

条件付け

批評家は貴社の基準に条件付けられます。

OtterLoop は一般的な「良し悪し」スコアではありません。判定を、組織のポリシー、エージェントに与えられたプロンプトや意図、遵守すべき参照ファイルに条件付けられます。

組織ポリシー

適切な受入ポリシーを適用。同一アーティファクトでも、チームごとに合否が異なる理由を防御可能な形で示せます。

プロンプトと意図

元の依頼を批評家に渡し、理想化した一般解ではなく、課題に対して評価します。

参照ファイル

ブランドガイド、ゴールド例、真正ソース文書、過去反復が条件付けの証拠になります。

アンカーは bbox・点・スパン・セル・スライド・ページ・タイムスタンプへ位置特定します。
バンドは実行時のポリシー判断であり、モデルの文章がゲートを装うものではありません。
リッチな返却により、同じ判定が人間レビューと機械的修正の双方を駆動します。

モダリティ

マルチモーダル入力。リッチなマルチモーダル出力。

同じループが、テキスト、コード、画像、デック、ドキュメント、スプレッドシート、音声、動画、マルチステップの軌跡をカバーします。返却には人間やエージェントが読めるメディアと、正規の判定 JSON の両方を含められます。

ライブデモを試すルーブリックを閲覧

モダリティ	返却
画像またはデザインフレーム	注釈付き PNG、欠陥のバウンディングボックス、Markdown レポート
デック、PDF、またはドキュメント	注釈付きページ、ページごとのノート、機械可読のアンカー
スプレッドシート	フラグ付きセル、基準に基づくノート、構造化された差分
動画または音声	タイムスタンプマーカー、キャプション、ローカライズされた根拠
テキストまたはコード	スパンにアンカーしたレビューと、エージェントが適用できるアップグレード案

SeaOtterアクセスをリクエスト