LIVE CRITIC DEMO

Grade an artifact. Revise it. Re-grade it.

This demo uses the real eval runtime in this codebase: create a run, fetch the score, revise the draft, and iterate with the same hostile critic loop the product exposes to agents.

Runtime contract

The page posts to `/api/v1/eval/runs`, fetches `/api/v1/eval/runs/{id}/score`, and iterates through `/api/v1/eval/runs/{id}/iterate` with `decision=re_prompt`.

Rubrics load from the public `/api/v1/eval/rubrics` listing.
If the live runtime requires auth, the page falls back to a canned verdict instead of failing blank.
The delta view is computed client-side from the flaw set before and after revision.

Sample artifactRubricLoading live rubrics…Prompt or intentThe live demo sends this as the run intent and, on iteration, as the critic-facing pushback for the revised artifact.ArtifactPaste text directly or start from a sample. The samples cover multiple modalities using text, TSV, outline, and transcript representations that the live API in this repo can score today.

Ready to grade.

Verdict

Run a grade to see live flaws, upgrades, and revision deltas.