OTTERLOOP로 빌드

몇 분 안에 크리틱을 에이전트에 연결하십시오.

OtterLoop는 SeaOtter의 적대적 크리틱을 위한 에이전트 지향 계약입니다. 동일한 루프가 AgentOS 온/오프, 모든 프레임워크·모델·클라우드에서 작동합니다: 작업 제출, 판정 읽기, 수정, 밴드가 게이트를 통과할 때까지 반복.

MCPHTTPPython SDK멀티모달 산출물현지화된 피드백

통합

복사-붙여넣기 시작 3가지. 계약 1개.

모든 것은 동일한 평가 계약으로 라우팅됩니다. 호스팅 API가 크리틱 실행, 조건화, 로컬라이제이션, 리치 리턴, 서명된 감사 기록을 담당합니다. MCP 서버와 Python SDK는 그 HTTP 표면 위 얇은 래퍼입니다.

`detail`, `rationale`, `upgrades`를 로캘별로 로컬라이즈.
스팬, 셀, 슬라이드, 페이지, 프레임, 타임스탬프에 앵커링.
에이전트가 미디어 바이트를 필요로 할 때 렌더링된 산출물을 별도로 가져오기.
자동화 폴백을 위해 표준 피드백 번들은 JSON으로 유지.

통합

세 가지 진입점.

MCP

Claude, Codex, Cursor 또는 MCP를 지원하는 어떤 런타임에서든 사용.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

HTTP를 통한 단발 채점 호출.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

클라이언트가 ship까지 생성 → 채점 → 수정을 구동하도록 위임.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

개발자 콘솔

평가 API 키 받기

계정에 평가 API 키를 발급한 뒤, MCP, Python SDK 또는 curl 설정을 복사해 어떤 에이전트든 SeaOtter의 적대적 크리틱에 연결하십시오. 시크릿은 한 번만 표시됩니다 — 이 페이지를 떠나기 전에 보관하십시오.

키 로딩 중…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

판정 계약

에이전트는 하나의 스키마에 따라 행동합니다.

판정은 사람 검토 스크린샷이 아니라 프런티어 에이전트를 위해 설계되었습니다. 점수, 밴드, 결함, 업그레이드, 앵커, 근거, 그리고 에이전트가 바로 사용할 수 있는 리치 피드백 산출물 참조를 담습니다.

판정 스키마

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

조건화

크리틱은 귀하의 기준에 조건화됩니다.

OtterLoop는 일반적인 '좋은가' 점수가 아닙니다. 계약은 귀 조직의 정책, 에이전트가 받은 프롬프트/의도, 반드시 준수해야 할 레퍼런스 파일에 판정을 조건화할 수 있습니다.

조직 정책

올바른 승인 정책을 적용하여 동일한 산출물이 타당한 이유로 한 팀에서는 통과하고 다른 팀에서는 실패할 수 있도록 합니다.

프롬프트 및 의도

원래 요청을 크리틱에 전달해 이상화된 정답이 아닌 과제 대비로 작업을 평가하게 합니다.

레퍼런스 파일

브랜드 가이드, 골드 예시, 진실의 근거 문서, 이전 반복이 모두 조건화 증거가 됩니다.

앵커는 bbox, 포인트, 스팬, 셀, 슬라이드, 페이지, 타임스탬프로 로컬라이즈됩니다.
밴드는 런타임 정책 결정이며, 게이트인 척하는 모델 산문이 아닙니다.
리치 리턴은 동일한 판정으로 사람 검토와 기계 수정 모두를 구동하게 합니다.

모달리티

멀티모달 입력. 리치 멀티모달 출력.

동일한 루프가 텍스트, 코드, 이미지, 슬라이드, 문서, 스프레드시트, 오디오, 비디오, 다단계 궤적을 포괄합니다. 반환에는 표준 판정 JSON과 사람이든 에이전트든 읽을 수 있는 미디어를 포함할 수 있습니다.

라이브 데모 사용 루브릭 둘러보기

모달리티	반환
이미지 또는 디자인 프레임	주석 처리된 PNG, 결함 바운딩 박스, 마크다운 리포트
슬라이드, PDF 또는 문서	주석 처리된 페이지, 페이지별 노트, 기계가 읽을 수 있는 앵커
스프레드시트	플래그 표시된 셀, 기준 기반 노트, 구조화된 델타
비디오 또는 오디오	타임스탬프 마커, 캡션, 로컬라이즈된 근거
텍스트 또는 코드	에이전트가 적용 가능한 업그레이드 초안과 스팬 앵커 리뷰

SeaOtter액세스 요청