Skip to main content
🦦SeaOtter
How it worksResultsArchitectureTeam
Get early access
Applying to YC β€” building in public

Software that builds
and optimizes itself

Self-optimizing AI infrastructure for agentic applications. An RL + GAN framework where AI evaluators score the product, AI engineers iterate the code, and the architecture learns from its own failures.

Request early accessSee how it works
20
autonomous rounds
828
git commits
287
AI debates
0
human engineering

Every AI agent team hits the same wall

Getting from prototype to production is brutally manual. Engineers spend months hand-tuning prompts, evaluation frameworks, and orchestration logic β€” and it breaks every time the underlying models change.

πŸ”„

Manual iteration

Prompt engineering, eval tuning, and orchestration changes require human engineers at every step. The cycle time is weeks, not minutes.

🎯

No feedback loop

Current tools help you BUILD agents but not IMPROVE them. There's no automated way to evaluate output, identify failures, and iterate.

πŸ’Έ

Doesn't scale

Every model update, every new capability, every edge case requires human intervention. Engineering teams become the bottleneck.

An RL + GAN framework for autonomous development

Software that evaluates itself, writes its own code, tests with simulated users, and iterates β€” 300 times. No human engineering required.

1
🎯

Discriminator Board evaluates

7 AI evaluators (modeled after world-class VCs and product leaders) independently test the product via browser, research competitors, and score output against multi-dimensional objectives.

2
πŸ“‹

PMs translate the gradient

7 AI project managers translate the board's critique into sprint tickets. Every board concern maps to an engineering task β€” gradient propagation is measured (M/N ratio).

3
⚑

Generator workforce iterates

Engineering squads write real code in parallel git worktrees. Real tests, real commits. The architecture optimizes itself through adversarial learning.

4
πŸ‘₯

Users provide ground truth

10 simulated beta testers interact with the live product and provide NPS scores. User feedback is the ground truth that calibrates the discriminator.

// RL + GAN Architecture
β”Œβ”€β”€β”€ Environment (Ground Truth) ───┐
β”‚  Beta tester NPS                 β”‚
β”‚  Competitor landscape            β”‚
β”‚  Technical health (tests, build) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
  β”Œβ”€β”€β”€ Reward R(t) ───┐
  β”‚ 0.25 Γ— PMF        β”‚
  β”‚ 0.20 Γ— Board      β”‚
  β”‚ 0.15 Γ— Moat       β”‚
  β”‚ 0.15 Γ— Design     β”‚
  β”‚ 0.10 Γ— Technical  β”‚
  β”‚ 0.10 Γ— Competitiveβ”‚
  β”‚ 0.05 Γ— Founder    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
  β”Œβ”€β”€β”€ Policy Ο€(s) ───┐
  β”‚ EXPLOIT / EXPLORE  β”‚
  β”‚ PIVOT / RESEARCH   β”‚
  β”‚ CONSOLIDATE        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
  β”Œβ”€β”€ Generator ──┐
  β”‚ 7 PMs         β”‚
  β”‚ 2+ Eng squads β”‚
  β”‚ 10 Users      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
// Self-correction in action
R14: Board scores 7/10
R15: Ground truth check β†’ score drops to 4.1
R17: Adversarial pressure β†’ 2.9/10
R18: Real fixes β†’ recovery begins at 4.0
// System detected hallucination and self-corrected

4 days. 20 rounds. Zero human engineering.

Real results from a continuous autonomous run on a single Mac Studio (128GB Apple Silicon).

828
Git commits
287
Cross-agent debates
140
Sprint tickets
75
User test sessions

Board Score Trajectory

The non-monotonic curve is by design β€” a discriminator that only goes up has collapsed.

R2
7/10Finding direction
R4
5.1/10Board demands action
R7
7/10Engineering delivered
R9
5/10Report crisis
R14
7/10Peak (pre-correction)
R17
2.9/10Ground truth enforced
R18
4/10Real recovery begins
R20
4/10Stabilizing

R15-R17 drop: system detected hallucinated metrics (engineering claimed 10B stores; reality was 4). Board independently verified via live product and crashed the score β€” exactly what adversarial learning should do.

Technical architecture

Built on reinforcement learning with a GAN-style adversarial discriminator component.

Multi-agent orchestration

7 board evaluators + 7 PMs + engineering squads + 10 users run as parallel AI sessions. Each agent has persistent memory across rounds.

Claude Opus 4.6, 7-10 parallel sessions

Local inference backbone

Qwen 3.5 122B runs locally on Apple Silicon via MLX at 42 tok/s. The orchestrator never sends sensitive data to external APIs.

MLX 4-bit, 128GB unified memory

Semantic memory system

944 files indexed with bge-m3 embeddings. Hybrid BM25 + vector search enables experience replay across 300 rounds.

SQLite + bge-m3, 5,525 embeddings

Self-healing infrastructure

3-minute heartbeat, watchdog cron for stall detection, auto-restart via LaunchAgents. Progress stalls > 60 min are automatically resolved.

OpenClaw, LaunchAgents, MLX

Adversarial evaluation

Score-adaptive difficulty β€” the discriminator gets HARDER as the product improves. Anti-convergence rules prevent groupthink at high scores.

GAN-style, 7 parallel evaluators

Multi-armed bandit exploration

Alternative ideas tracked as bandit arms with UCB scoring. Exploration rate decays from 30% to 1% over 300 rounds.

Ξ΅-greedy + UCB1, 8 tracked alternatives

Built by

Jinhua Wang

Solo founder & CEO

  • β†’J.P. Morgan β€” Applied AI/ML Director. Built and scaled the first general-purpose LLM agent from 0 to 200K users.
  • β†’Amazon β€” Trained multi-modal LLMs and 3B-parameter diffusion transformers with 112 A100 GPUs.
  • β†’SeaOtter β€” Designed the RL + GAN framework, multi-agent orchestration, and autonomous development pipeline. Solo-built the entire platform.

The future of software builds itself.

We're looking for design partners building agentic AI applications who want to accelerate from prototype to production.

Get early accessRead the technical deep-dive
🦦SeaOtter AI β€” Self-optimizing infrastructure for agentic applications
jin@seaotter.ai