Phase 3 of this wave of the AI Revolution: Self‑Optimising, Self‑Adaptive, Self‑Play, and Evolutionary AI - A Contemporary Review (2020–2025)
- Christopher Foster-McBride
- 7 days ago
- 4 min read

AI is trending toward systems that improve and adapt themselves: models that rewrite code or pipelines (self‑optimising), update their own parameters in deployment (self‑adaptive), learn via leagues of opponents (self‑play), and evolve solutions through population search (evolutionary AI).
Recent exemplars include the Darwin‑Gödel Machine (self‑improving coding agents), DeepMind’s AlphaEvolve (LLM‑orchestrated evolutionary algorithm/code discovery), SEAL (LLMs that self‑edit weights at run‑time), and open‑ended self‑play in XLand. Together, these developments point to a future of open‑ended, self‑driven AI—bringing new requirements for safety and evaluation.
Why read this article? Few realise just how seismic the shift in AI developments has been in the last few months. Phase 3 AI systems don’t just use algorithms—they learn to improve, adapt, and reinvent those algorithms themselves. That shift—from human-tuned AI to self-driven AI—changes the pace, direction, and ownership of progress.
Fundamentally, it will mean we will have to answer the question, ‘Are humans a limiting factor in AI development?'.
What changes in practice (next 3–5 years):
Discovery accelerates: Evolutionary and self-optimising loops will routinely uncover faster code, novel algorithms, and unexpected designs across science, engineering, and healthcare—shrinking R&D cycles from months to days.
Resilience becomes default: Self-adaptive models update safely in the field, handling drift, new guidelines, or rare edge cases without waiting for quarterly retrains.
Learning without labels scales: Self-play and self-reward methods generate rich curricula and feedback, pushing capability into domains where labelled data is scarce (e.g., hospital clinics, public services, safety-critical ops).
Phase 3 is not merely smarter models; it’s compounding improvement. Organisations and societies that master how systems improve themselves—and how to constrain, test, and govern that improvement—will capture outsized gains in productivity, safety, and innovation.
So, let's dive into the definitions and examples you will need to get familiar with!
Self‑optimising AI
An AI system that improves its own performance autonomously by changing its code, training recipe, or operating procedure (e.g., algorithmic edits, toolchains, prompts/pipelines, hyperparameters).
Think: “I rewrite or reconfigure myself to get better.”
Examples: Darwin‑Gödel Machine; DeepMind’s AlphaEvolve.
Self‑adaptive AI
An AI system that adjusts itself during or after deployment to new tasks, inputs, users, or environments, typically by updating weights/parameters/internal state on the fly (often with guardrails to preserve prior competence).
Think: “I adapt my behaviour/weights when the world changes.”
Example: Self‑Adapting Language Models (SEAL).
Self‑play (training regime)
A learning curriculum where agents improve by playing against themselves or past versions, auto‑generating increasingly challenging data.
Think: “I become stronger by being my own opponent.”
Examples: AlphaStar; XLand.
Evolutionary AI
A family of search/optimisation methods (selection, mutation, recombination, quality‑diversity) evolving populations of models, programs, or policies to find high‑performing/diverse solutions.
Think: “Many variants compete; the fittest survive and combine.”
Examples: AutoML‑Zero; Enhanced POET; AlphaDev; AlphaTensor; AlphaEvolve.
Dimension | Self‑optimising | Self‑adaptive | Self‑play | Evolutionary |
Primary goal | Improve how the system works (code/process) | Maintain/improve under novel conditions | Auto‑curriculum via opponents | Discover strong/diverse solutions |
What changes | Code, pipelines, prompts, training recipe | Weights/parameters/state in deployment | Opponents/tasks/data distribution | Populations (architectures, code, policies) |
Mechanism | Meta‑optimisation, tool orchestration | Continual/online learning, self‑edits | RL leagues, fictitious play | GA/ES/QD, open‑ended evolution |
Typical evidence | Benchmark uplift after self‑edits | Post‑deployment gains without forgetting | ELO/score vs league/past selves | SOTA/novel designs; ablations |
Additional Details, papers, and code
Self‑Optimising AI: systems that improve themselves
Darwin‑Gödel Machine (DGM) — A self‑improving coding agent that iteratively rewrites its own code and empirically validates changes on coding benchmarks (e.g., SWE‑bench 20.0%→50.0%; Polyglot 14.2%→30.7%). Technical report: arXiv:2505.22954 | Project: Sakana.ai DGM
AlphaEvolve (DeepMind) — LLM‑orchestrated evolutionary coding agent that proposes, verifies, and selects code/algorithmic variants. Deployed improvements include data-centre heuristics (recovering ~0.7% compute), TPU circuit simplification, and kernel speedups (e.g., FlashAttention up to 32.5%). Notably discovered a 4×4 complex matrix multiplication using 48 scalar multiplications (surpassing Strassen, 1969), and sped up a key Gemini training kernel by 23% (→ ~1% end‑to‑end training time reduction). Blog: DeepMind blog
3) Self‑Adaptive AI: systems that update themselves in deployment
SEAL — Self‑Adapting Language Models that generate their own “self‑edits” (finetuning data + update directives) and apply gradient‑based updates during/after use. An outer RL loop rewards edits that improve downstream performance. arXiv:2506.10943 | Code/website: GitHub · Project page
Self‑rewarding post‑training — Language models provide their own reward signals (LLM‑as‑a‑Judge) to iteratively improve instruction following without human preference labels. Self‑Rewarding Language Models (Yuan et al., 2024)
4) Self‑Play: agents that generate their own curriculum
AlphaStar — League‑based self‑play reaches Grandmaster level in StarCraft II (above 99.8% of ranked players). Nature (2019) · Preprint PDF
XLand — Open‑ended self‑play across hundreds of thousands of procedurally generated tasks yields generally capable agents with zero‑shot generalisation to held‑out games; emergent behaviours include tool use and cooperation. arXiv:2107.12808 · DeepMind blog
CICERO (Meta) — Human‑level performance in Diplomacy by combining a language model with strategic planning and self‑play RL. Science (2022) · Code
5) Evolutionary AI: population search, quality‑diversity, and open‑endedness
AutoML‑Zero — Evolutionary search that discovers complete ML algorithms from scratch. arXiv:2003.03384 · ICML 2020
Enhanced POET — Co‑evolves agents and environments to sustain open‑ended learning with transfer between tasks. arXiv:2003.08536 · PMLR PDF · Code
AlphaDev — Reinforcement learning discovers faster sorting algorithms (merged into standard libraries). Nature (2023) · DeepMind blog
AlphaTensor — RL discovers novel matrix multiplication algorithms. Nature (2022) · DeepMind blog · Code/algorithms
Note: Without stewardship, I am aware we risk systems that optimise the wrong things at unprecedented speed. The next breakthrough isn’t just technical; it’s the art of guiding self-improving intelligence toward human ends. I will explore this in another upcoming blog on the implications for AI risks for Phase 3.
About the Author: Christopher Foster-McBride, Founder of tokescompare, originator of the AI Trust Paradox/Verisimilitude Pardox, and CEO of Digital Human Assistants, public sector CIO.