top of page

Phase 3 of this wave of the AI Revolution: Self‑Optimising, Self‑Adaptive, Self‑Play, and Evolutionary AI - A Contemporary Review (2020–2025)

  • Writer: Christopher Foster-McBride
    Christopher Foster-McBride
  • 7 days ago
  • 4 min read

ree

AI is trending toward systems that improve and adapt themselves: models that rewrite code or pipelines (self‑optimising), update their own parameters in deployment (self‑adaptive), learn via leagues of opponents (self‑play), and evolve solutions through population search (evolutionary AI).


Recent exemplars include the Darwin‑Gödel Machine (self‑improving coding agents), DeepMind’s AlphaEvolve (LLM‑orchestrated evolutionary algorithm/code discovery), SEAL (LLMs that self‑edit weights at run‑time), and open‑ended self‑play in XLand. Together, these developments point to a future of open‑ended, self‑driven AI—bringing new requirements for safety and evaluation.


Why read this article? Few realise just how seismic the shift in AI developments has been in the last few months. Phase 3 AI systems don’t just use algorithms—they learn to improve, adapt, and reinvent those algorithms themselves. That shift—from human-tuned AI to self-driven AI—changes the pace, direction, and ownership of progress.


Fundamentally, it will mean we will have to answer the question, ‘Are humans a limiting factor in AI development?'.


What changes in practice (next 3–5 years):

  • Discovery accelerates: Evolutionary and self-optimising loops will routinely uncover faster code, novel algorithms, and unexpected designs across science, engineering, and healthcare—shrinking R&D cycles from months to days.

  • Resilience becomes default: Self-adaptive models update safely in the field, handling drift, new guidelines, or rare edge cases without waiting for quarterly retrains.

  • Learning without labels scales: Self-play and self-reward methods generate rich curricula and feedback, pushing capability into domains where labelled data is scarce (e.g., hospital clinics, public services, safety-critical ops).


Phase 3 is not merely smarter models; it’s compounding improvement. Organisations and societies that master how systems improve themselves—and how to constrain, test, and govern that improvement—will capture outsized gains in productivity, safety, and innovation.


So, let's dive into the definitions and examples you will need to get familiar with!


Self‑optimising AI


  • An AI system that improves its own performance autonomously by changing its code, training recipe, or operating procedure (e.g., algorithmic edits, toolchains, prompts/pipelines, hyperparameters).

  • Think: “I rewrite or reconfigure myself to get better.”

  • Examples: Darwin‑Gödel Machine; DeepMind’s AlphaEvolve.


Self‑adaptive AI


  • An AI system that adjusts itself during or after deployment to new tasks, inputs, users, or environments, typically by updating weights/parameters/internal state on the fly (often with guardrails to preserve prior competence).

  • Think: “I adapt my behaviour/weights when the world changes.”

  • Example: Self‑Adapting Language Models (SEAL).


Self‑play (training regime)


  • A learning curriculum where agents improve by playing against themselves or past versions, auto‑generating increasingly challenging data.

  • Think: “I become stronger by being my own opponent.”

  • Examples: AlphaStar; XLand.


Evolutionary AI


  • A family of search/optimisation methods (selection, mutation, recombination, quality‑diversity) evolving populations of models, programs, or policies to find high‑performing/diverse solutions.

  • Think: “Many variants compete; the fittest survive and combine.”

  • Examples: AutoML‑Zero; Enhanced POET; AlphaDev; AlphaTensor; AlphaEvolve.


Dimension

Self‑optimising

Self‑adaptive

Self‑play

Evolutionary

Primary goal

Improve how the system works (code/process)

Maintain/improve under novel conditions

Auto‑curriculum via opponents

Discover strong/diverse solutions

What changes

Code, pipelines, prompts, training recipe

Weights/parameters/state in deployment

Opponents/tasks/data distribution

Populations (architectures, code, policies)

Mechanism

Meta‑optimisation, tool orchestration

Continual/online learning, self‑edits

RL leagues, fictitious play

GA/ES/QD, open‑ended evolution

Typical evidence

Benchmark uplift after self‑edits

Post‑deployment gains without forgetting

ELO/score vs league/past selves

SOTA/novel designs; ablations


Additional Details, papers, and code


Self‑Optimising AI: systems that improve themselves


  • Darwin‑Gödel Machine (DGM) — A self‑improving coding agent that iteratively rewrites its own code and empirically validates changes on coding benchmarks (e.g., SWE‑bench 20.0%→50.0%; Polyglot 14.2%→30.7%). Technical report: arXiv:2505.22954  | Project: Sakana.ai DGM


  • AlphaEvolve (DeepMind) — LLM‑orchestrated evolutionary coding agent that proposes, verifies, and selects code/algorithmic variants. Deployed improvements include data-centre heuristics (recovering ~0.7% compute), TPU circuit simplification, and kernel speedups (e.g., FlashAttention up to 32.5%). Notably discovered a 4×4 complex matrix multiplication using 48 scalar multiplications (surpassing Strassen, 1969), and sped up a key Gemini training kernel by 23% (→ ~1% end‑to‑end training time reduction). Blog: DeepMind blog 


3) Self‑Adaptive AI: systems that update themselves in deployment


  • SEAL — Self‑Adapting Language Models that generate their own “self‑edits” (finetuning data + update directives) and apply gradient‑based updates during/after use. An outer RL loop rewards edits that improve downstream performance. arXiv:2506.10943 | Code/website: GitHub · Project page


  • Self‑rewarding post‑training — Language models provide their own reward signals (LLM‑as‑a‑Judge) to iteratively improve instruction following without human preference labels. Self‑Rewarding Language Models (Yuan et al., 2024)


4) Self‑Play: agents that generate their own curriculum


  • AlphaStar — League‑based self‑play reaches Grandmaster level in StarCraft II (above 99.8% of ranked players). Nature (2019) · Preprint PDF


  • XLand — Open‑ended self‑play across hundreds of thousands of procedurally generated tasks yields generally capable agents with zero‑shot generalisation to held‑out games; emergent behaviours include tool use and cooperation. arXiv:2107.12808 · DeepMind blog


  • CICERO (Meta) — Human‑level performance in Diplomacy by combining a language model with strategic planning and self‑play RL. Science (2022) · Code


5) Evolutionary AI: population search, quality‑diversity, and open‑endedness



Note: Without stewardship, I am aware we risk systems that optimise the wrong things at unprecedented speed. The next breakthrough isn’t just technical; it’s the art of guiding self-improving intelligence toward human ends. I will explore this in another upcoming blog on the implications for AI risks for Phase 3.


About the Author: Christopher Foster-McBride, Founder of tokescompare, originator of the AI Trust Paradox/Verisimilitude Pardox, and CEO of Digital Human Assistants, public sector CIO.


References (Hyperlinked)


 
 
bottom of page