ابنِ باستخدام OTTERLOOP

وصّل الناقد بوكيل ذكي خلال دقائق.

OtterLoop هو عقد مواجهة الوكيل الذكي لناقد SeaOtter العدائي. تعمل الحلقة نفسها على AgentOS أو خارجه، عبر أي إطار عمل ونموذج وسحابة: أرسل العمل، واقرأ الحكم، وراجع، وكرّر حتى يجتاز النطاق بوابتك.

MCPHTTPPython SDKمُخرجات متعددة الوسائطتغذية راجعة مُحلّية

التكامل

ثلاث بدايات نسخ-ولصق. عقد واحد.

كل شيء يُوجَّه إلى عقد التقييم نفسه. تملك واجهة API المستضافة تنفيذ الناقد، والتكييف، والتوطين، وعمليات الإرجاع الغنية، وسجل التدقيق الموقّع. خادم MCP وPython SDK مجرد أغلفة رقيقة فوق سطح HTTP ذاته.

محلّية الحقول `detail` و`rationale` و`upgrades` حسب اللغة.
اربط بمَدد، خلايا، شرائح، صفحات، إطارات، أو طوابع زمنية.
اجلب المُخرجات المعروضة بشكل منفصل عندما يحتاج الوكيل الذكي بايتات الوسائط.
أبقِ حزمة التغذية الراجعة المعيارية بصيغة JSON لأتمتة آمنة في حالات التراجع.

التكامل

ثلاث طرق للدخول.

MCP

استخدمه في Claude أو Codex أو Cursor أو أي بيئة تشغيل تتحدث MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

نداء تقييم أحادي عبر HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

دع العميل يقود الإنتاج → التقييم → المراجعة حتى الإطلاق.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

وحدة المطوّر

احصل على مفتاح eval API

أنشئ مفتاح eval-API لحسابك، ثم انسخ إعداد MCP أو Python SDK أو curl الجاهز للّصق لتوصيل أي وكيل ذكي بالناقد العدائي لـ SeaOtter. يُعرض السر مرة واحدة — خزّنه قبل مغادرة هذه الصفحة.

جارٍ تحميل المفاتيح…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

SeaOtter is agent-native: an agent can discover this contract from /llms.txt, then run the whole loop over MCP or plain HTTP. The only human step today is minting the first eval key (above) — the agent does the rest. Every eval call carries Authorization: Bearer <sk-otter-...>. Bases: https://api.seaotter.ai (prod), https://dev-api.seaotter.ai (dev).

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

عقد الحكم

يتصرف الوكيل الذكي وفق مخطط واحد.

صُمّم الحكم لوكلاء ذكيين متقدّمين، لا لصور شاشات مراجعة بشرية. يحمل الدرجة، والنطاق، والعيوب، والتحسينات، والمؤشرات الموضعية، والتعليل، ومراجع مُخرجات تغذية راجعة غنية يمكن للوكيل استخدامها مباشرة.

مخطط الحكم

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

التكييف

الناقد مشروط على حاجزكم.

OtterLoop ليس درجة عامة من نوع "هل هذا جيد". يمكن للعقد تكييف الحكم على سياسة مؤسستك، والموجه أو النية التي مُنحت للوكيل الذكي، والملفات المرجعية التي يجب الالتزام بها.

سياسة المؤسسة

طبّق سياسة القبول الصحيحة بحيث يمكن لنفس المُخرج أن يجتاز فريقاً ويفشل عند آخر لأسباب قابلة للدفاع.

الموجه والنية

انقل الطلب الأصلي إلى الناقد ليحكم على العمل مقابل التكليف، لا مقابل إجابة مثالية عامة.

الملفات المرجعية

أدلة العلامة التجارية، الأمثلة الذهبية، مصادر الحقيقة، والتكرارات السابقة كلها تصبح أدلة تكييف.

تُوطّن المؤشرات إلى bbox أو نقطة أو مدى أو خلية أو شريحة أو صفحة أو طابع زمني.
النطاق قرار سياسة وقت التشغيل، لا نثر نموذج يتظاهر بأنه بوابة.
المرتجعات الغنية تُتيح لنفس الحكم قيادة المراجعة البشرية والمراجعة الآلية معاً.

الأنماط

متعدد الوسائط دخولاً. ومتعدد الوسائط غِنىً في الخروج.

تُغطي الحلقة نفسها النص والشيفرة والصور والعروض والمستندات وجداول البيانات والصوت والفيديو والمسارات متعددة الخطوات. يمكن أن تتضمن المرتجعات كلّاً من JSON الحكم المعياري والوسائط التي يمكن لإنسان أو وكيل ذكي قراءتها.

جرّب العرض المباشر تصفح المصفوفات

الأنماط	المرتجعات
صورة أو إطار تصميم	PNG مُشَرَّح مع صناديق تحديد العيوب وتقرير markdown
عرض شرائح أو PDF أو مستند	صفحات مُعنونة، وملاحظات لكل صفحة، ومؤشرات موضعية قابلة للقراءة آلياً
جدول بيانات	خلايا معلّمة، وملاحظات مؤسَّسة على المعايير، وفروقات مُهيكلة
فيديو أو صوت	علامات طوابع زمنية، وتسميات توضيحية، وتعليل مُحَلّى موضعياً
نص أو شيفرة	مراجعة مُثبتة بالمدى مع مسودات تحسين يمكن للوكيل الذكي تطبيقها

SeaOtterطلب الوصول

ابنِ باستخدام OTTERLOOP

وصّل الناقد بوكيل ذكي خلال دقائق.

MCPHTTPPython SDKمُخرجات متعددة الوسائطتغذية راجعة مُحلّية

التكامل

ثلاث بدايات نسخ-ولصق. عقد واحد.

محلّية الحقول `detail` و`rationale` و`upgrades` حسب اللغة.
اربط بمَدد، خلايا، شرائح، صفحات، إطارات، أو طوابع زمنية.
اجلب المُخرجات المعروضة بشكل منفصل عندما يحتاج الوكيل الذكي بايتات الوسائط.
أبقِ حزمة التغذية الراجعة المعيارية بصيغة JSON لأتمتة آمنة في حالات التراجع.

التكامل

ثلاث طرق للدخول.

MCP

استخدمه في Claude أو Codex أو Cursor أو أي بيئة تشغيل تتحدث MCP.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://dev-api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

curl

نداء تقييم أحادي عبر HTTP.

curl -s https://dev-api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance", "locale":"ja",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"..."}],
        "return_feedback_artifacts": true }'

Python SDK

دع العميل يقود الإنتاج → التقييم → المراجعة حتى الإطلاق.

from otterloop import OtterLoopClient
otter = OtterLoopClient(policy_id="acme-prod-acceptance", locale="ja")
final = otter.loop(produce=lambda feedback: my_agent.revise(feedback), work=my_agent.first_draft(), modality="document", references=["file://brand-guide.pdf", "file://gold-postmortem.md"], max_rounds=5, target_band="ship")

وحدة المطوّر

احصل على مفتاح eval API

جارٍ تحميل المفاتيح…

AGENT QUICKSTART

Onboard an agent without a human in the loop after the first key.

1. Get a key — A signed-in org user creates an account at /signup, then mints an eval key here (POST /api/v1/agent-keys, body {"name":"my-agent"}). The full sk-otter-<40 hex> secret is shown exactly once. Hand it to the agent as OTTERLOOP_API_KEY / the bearer token.
2. Connect (MCP or HTTP) — Drop the .mcp.json below into Claude / Codex / Cursor for read-only tools (otter_list_policies, otter_score, otter_iterate, otter_score_workflow, otter_get_feedback_artifact), or call the HTTP API directly. pip install "otterloop[mcp]" (or pip install otterloop for the stdlib SDK).
3. Score — POST /api/v1/eval/feedback with the artifact + the prompt the agent was given (+ optional policy_id, locale, references). Get back { run_id, verdict }. The verdict has score (0-100, lower = more flawed), band (ship / route_to_fix / quarantine / block), flaws[], upgrades[].
4. Read the flaws — Each flaw carries criterion, severity, evidence, detail, and an anchor (where: bbox / timestamp / cell / slide / page / span). upgrades[] are concrete fixes the agent can apply.
5. Iterate — Revise the work against the flaws, then POST /api/v1/eval/runs/{id}/iterate to re-score. Repeat until band clears your gate (e.g. ship). The Python SDK's otter.loop(produce=..., work=..., target_band="ship") drives this automatically.
6. Workflow / benchmark — POST /api/v1/eval/workflows/{id}/topology to score an end-to-end multi-step workflow graph and get a composite + per-step + chain critique. Discover policies (GET /api/v1/eval/policies) and rubrics (GET /api/v1/eval/rubrics) to condition grading.

Machine-readable: /llms.txt · OpenAPI spec · interactive API docs.

1 — mint a key (one-time, signed-in user)

Returns the full sk-otter-... secret exactly once.

curl -s -X POST https://api.seaotter.ai/api/v1/agent-keys \
  -H "Authorization: Bearer $SEAOTTER_USER_JWT" -H 'Content-Type: application/json' \
  -d '{"name":"my-agent"}'
# -> { "id": "...", "key": "sk-otter-...", "key_prefix": "sk-otter-abcde", ... }

2 — connect over MCP (.mcp.json)

Claude / Cursor. Codex uses [mcp_servers.otterloop] in config.toml.

{ "mcpServers": { "otterloop": {
    "command": "python", "args": ["-m", "otterloop.mcp_server"],
    "env": { "OTTERLOOP_API_URL": "https://api.seaotter.ai",
             "OTTERLOOP_API_KEY": "sk-otter-...",
             "OTTERLOOP_POLICY_ID": "acme-prod-acceptance" } } } }

3 — score over HTTP

One-shot grade -> verdict + run_id to keep iterating.

curl -s https://api.seaotter.ai/api/v1/eval/feedback \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "modality":"text", "policy_id":"acme-prod-acceptance",
        "prompt":"Draft the Q3 incident postmortem",
        "artifact_parts":[{"mime_type":"text/plain","text":"...your work..."}],
        "return_feedback_artifacts": true }'
# -> { "run_id": "...", "verdict": { "score": ..., "band": "route_to_fix", "flaws": [...] } }

5 — iterate until it ships

Re-score a revision against the same run.

curl -s -X POST https://api.seaotter.ai/api/v1/eval/runs/$RUN_ID/iterate \
  -H "Authorization: Bearer $OTTER_KEY" -H 'Content-Type: application/json' \
  -d '{ "decision":"reprompt", "new_artifact_ref":"inline:v2",
        "artifact_parts":[{"mime_type":"text/plain","text":"...revised work..."}] }'

عقد الحكم

يتصرف الوكيل الذكي وفق مخطط واحد.

مخطط الحكم

{
  "score": 0.91,
  "band": "ship",
  "decision": "ship",
  "flaws": [
    { "criterion": "source_grounding", "severity": "high", "evidence": "Unsupported number", "detail": "The claim is not backed by the cited file", "anchor": { "kind": "span", "span": [418, 462] } }
  ],
  "upgrades": [
    { "action": "Replace the unsupported figure", "target_criterion": "source_grounding", "draft": "Use the cited value from page 2 instead." }
  ],
  "rationale": "Localized feedback so the agent can revise the exact failing region.",
  "feedback_artifacts": [{ "kind": "annotated_png", "ref": "artifact://..." }]
}

التكييف

الناقد مشروط على حاجزكم.

سياسة المؤسسة

طبّق سياسة القبول الصحيحة بحيث يمكن لنفس المُخرج أن يجتاز فريقاً ويفشل عند آخر لأسباب قابلة للدفاع.

الموجه والنية

انقل الطلب الأصلي إلى الناقد ليحكم على العمل مقابل التكليف، لا مقابل إجابة مثالية عامة.

الملفات المرجعية

أدلة العلامة التجارية، الأمثلة الذهبية، مصادر الحقيقة، والتكرارات السابقة كلها تصبح أدلة تكييف.

تُوطّن المؤشرات إلى bbox أو نقطة أو مدى أو خلية أو شريحة أو صفحة أو طابع زمني.
النطاق قرار سياسة وقت التشغيل، لا نثر نموذج يتظاهر بأنه بوابة.
المرتجعات الغنية تُتيح لنفس الحكم قيادة المراجعة البشرية والمراجعة الآلية معاً.

الأنماط

متعدد الوسائط دخولاً. ومتعدد الوسائط غِنىً في الخروج.

جرّب العرض المباشر تصفح المصفوفات

الأنماط	المرتجعات
صورة أو إطار تصميم	PNG مُشَرَّح مع صناديق تحديد العيوب وتقرير markdown
عرض شرائح أو PDF أو مستند	صفحات مُعنونة، وملاحظات لكل صفحة، ومؤشرات موضعية قابلة للقراءة آلياً
جدول بيانات	خلايا معلّمة، وملاحظات مؤسَّسة على المعايير، وفروقات مُهيكلة
فيديو أو صوت	علامات طوابع زمنية، وتسميات توضيحية، وتعليل مُحَلّى موضعياً
نص أو شيفرة	مراجعة مُثبتة بالمدى مع مسودات تحسين يمكن للوكيل الذكي تطبيقها