Phase 3 of this wave of the AI Revolution: Self‑Optimising, Self‑Adaptive, Self‑Play, and Evolutionary AI - A Contemporary Review (2020–2025)

Christopher Foster-McBride
Aug 3
4 min read

Updated: 5 days ago

AI is trending toward systems that improve and adapt themselves: models that rewrite code or pipelines (self‑optimising), update their own parameters in deployment (self‑adaptive), learn via leagues of opponents (self‑play), and evolve solutions through population search (evolutionary AI).

Recent exemplars include the Darwin‑Gödel Machine (self‑improving coding agents), DeepMind’s AlphaEvolve (LLM‑orchestrated evolutionary algorithm/code discovery), SEAL (LLMs that self‑edit weights at run‑time), and open‑ended self‑play in XLand. Together, these developments point to a future of open‑ended, self‑driven AI—bringing new requirements for safety and evaluation.

Why read this article? Few realise just how seismic the shift in AI developments has been in the last few months. Phase 3 AI systems don’t just use algorithms—they learn to improve, adapt, and reinvent those algorithms themselves. That shift—from human-tuned AI to self-driven AI—changes the pace, direction, and ownership of progress.

Fundamentally, it will mean we will have to answer the question, ‘Are humans a limiting factor in AI development?'.

What changes in practice (next 3–5 years):

Discovery accelerates: Evolutionary and self-optimising loops will routinely uncover faster code, novel algorithms, and unexpected designs across science, engineering, and healthcare—shrinking R&D cycles from months to days.
Resilience becomes default: Self-adaptive models update safely in the field, handling drift, new guidelines, or rare edge cases without waiting for quarterly retrains.
Learning without labels scales: Self-play and self-reward methods generate rich curricula and feedback, pushing capability into domains where labelled data is scarce (e.g., hospital clinics, public services, safety-critical ops).

Phase 3 is not merely smarter models; it’s compounding improvement. Organisations and societies that master how systems improve themselves—and how to constrain, test, and govern that improvement—will capture outsized gains in productivity, safety, and innovation.

So, let's dive into the definitions and examples you will need to get familiar with!

Self‑optimising AI

An AI system that improves its own performance autonomously by changing its code, training recipe, or operating procedure (e.g., algorithmic edits, toolchains, prompts/pipelines, hyperparameters).
Think: “I rewrite or reconfigure myself to get better.”
Examples: Darwin‑Gödel Machine; DeepMind’s AlphaEvolve.

Self‑adaptive AI

An AI system that adjusts itself during or after deployment to new tasks, inputs, users, or environments, typically by updating weights/parameters/internal state on the fly (often with guardrails to preserve prior competence).
Think: “I adapt my behaviour/weights when the world changes.”
Example: Self‑Adapting Language Models (SEAL).

Self‑play (training regime)

A learning curriculum where agents improve by playing against themselves or past versions, auto‑generating increasingly challenging data.
Think: “I become stronger by being my own opponent.”
Examples: AlphaStar; XLand.

Evolutionary AI

A family of search/optimisation methods (selection, mutation, recombination, quality‑diversity) evolving populations of models, programs, or policies to find high‑performing/diverse solutions.
Think: “Many variants compete; the fittest survive and combine.”
Examples: AutoML‑Zero; Enhanced POET; AlphaDev; AlphaTensor; AlphaEvolve.

Dimension	Self‑optimising	Self‑adaptive	Self‑play	Evolutionary
Primary goal	Improve how the system works (code/process)	Maintain/improve under novel conditions	Auto‑curriculum via opponents	Discover strong/diverse solutions
What changes	Code, pipelines, prompts, training recipe	Weights/parameters/state in deployment	Opponents/tasks/data distribution	Populations (architectures, code, policies)
Mechanism	Meta‑optimisation, tool orchestration	Continual/online learning, self‑edits	RL leagues, fictitious play	GA/ES/QD, open‑ended evolution
Typical evidence	Benchmark uplift after self‑edits	Post‑deployment gains without forgetting	ELO/score vs league/past selves	SOTA/novel designs; ablations

Additional Details, papers, and code

Self‑Optimising AI: systems that improve themselves

Darwin‑Gödel Machine (DGM) — A self‑improving coding agent that iteratively rewrites its own code and empirically validates changes on coding benchmarks (e.g., SWE‑bench 20.0%→50.0%; Polyglot 14.2%→30.7%). Technical report: arXiv:2505.22954 | Project: Sakana.ai DGM
AlphaEvolve (DeepMind) — LLM‑orchestrated evolutionary coding agent that proposes, verifies, and selects code/algorithmic variants. Deployed improvements include data-centre heuristics (recovering ~0.7% compute), TPU circuit simplification, and kernel speedups (e.g., FlashAttention up to 32.5%). Notably discovered a 4×4 complex matrix multiplication using 48 scalar multiplications (surpassing Strassen, 1969), and sped up a key Gemini training kernel by 23% (→ ~1% end‑to‑end training time reduction). Blog: DeepMind blog

3) Self‑Adaptive AI: systems that update themselves in deployment

SEAL — Self‑Adapting Language Models that generate their own “self‑edits” (finetuning data + update directives) and apply gradient‑based updates during/after use. An outer RL loop rewards edits that improve downstream performance. arXiv:2506.10943 | Code/website: GitHub · Project page
Self‑rewarding post‑training — Language models provide their own reward signals (LLM‑as‑a‑Judge) to iteratively improve instruction following without human preference labels. Self‑Rewarding Language Models (Yuan et al., 2024)

4) Self‑Play: agents that generate their own curriculum

AlphaStar — League‑based self‑play reaches Grandmaster level in StarCraft II (above 99.8% of ranked players). Nature (2019) · Preprint PDF
XLand — Open‑ended self‑play across hundreds of thousands of procedurally generated tasks yields generally capable agents with zero‑shot generalisation to held‑out games; emergent behaviours include tool use and cooperation. arXiv:2107.12808 · DeepMind blog
CICERO (Meta) — Human‑level performance in Diplomacy by combining a language model with strategic planning and self‑play RL. Science (2022) · Code

5) Evolutionary AI: population search, quality‑diversity, and open‑endedness

AutoML‑Zero — Evolutionary search that discovers complete ML algorithms from scratch. arXiv:2003.03384 · ICML 2020
Enhanced POET — Co‑evolves agents and environments to sustain open‑ended learning with transfer between tasks. arXiv:2003.08536 · PMLR PDF · Code
AlphaDev — Reinforcement learning discovers faster sorting algorithms (merged into standard libraries). Nature (2023) · DeepMind blog
AlphaTensor — RL discovers novel matrix multiplication algorithms. Nature (2022) · DeepMind blog · Code/algorithms

Note: Without stewardship, I am aware we risk systems that optimise the wrong things at unprecedented speed. The next breakthrough isn’t just technical; it’s the art of guiding self-improving intelligence toward human ends. I will explore this in another upcoming blog on the implications for AI risks for Phase 3.

About the Author: Christopher Foster-McBride, Founder of tokescompare, originator of the AI Trust Paradox / Verisimilitude Paradox, and CEO of Digital Human Assistants, public sector CIO.