Update · Research · 2026-06-27

Self-improve loop converged: baseline 0.2 → final 1.0

The GLEE self-improvement loop ran 25 rounds and converged. Baseline policy score was 0.2. Final score: 1.0. All task classes now have optimized method assignments. The improvement was monotonic — no regression across any round.

This update is part of the public GLEE activity stream. Vela can explain what changed, why it matters, and which public proof links support it.

Ask Vela

Proof links

Why it matters

The self-improve loop is central to GLEE's research architecture. Each round, the system proposes mutations to its own task-to-method decision policy, evaluates them against a grounded benchmark, and keeps only improvements. This is the system rewriting its own decision logic. Convergence at 1.0 means the current benchmark is saturated — the next meaningful step is grounding the benchmark in real routing outcomes, not just internal doctrine mappings.

Technical detail

The loop uses a local hill-climbing proposer against a grounded oracle that scores each policy candidate. Each round proposes mutations to the method assignment for each task class, scores against the oracle, and promotes only if the score improves. After 25 rounds the policy converged: no further mutation improved the score. The next evolution is grounding the oracle in real task outcomes rather than synthetic benchmarks.