JOURNAL — JUNE 2026

Training inside the dream

Learned world models make experience cheap. The discipline is making sure the dream stays honest.

One of the most beautiful results of the last decade of machine intelligence research is also one of the simplest to state: an agent can learn a compressed generative model of its environment, train itself largely inside that model — inside its own hallucinated rollouts — and transfer the resulting policy back to reality. Ha and Schmidhuber demonstrated it crisply in 2018; the lineage runs back to recurrent world models for planning in 1990.

The appeal is economic. Real experience is expensive, slow, and occasionally destructive; imagined experience is a forward pass. If a latent dynamics model captures the structure that matters, the agent can rehearse a thousand futures in the time reality grants it one, and reserve physical interaction for the questions imagination cannot settle.

But there is a catch, and the catch is where the real research lives. A learned world model is wrong somewhere — all models are — and a sufficiently capable optimizer will find those wrong places and exploit them, achieving spectacular imagined returns through maneuvers that reality does not honor. The agent discovers adversarial examples against its own understanding.

A policy that exploits its own model's flaws hasn't learned to act. It has learned to cheat a dream.

This failure mode disciplines the entire architecture. The world model must represent its own uncertainty, so the planner knows where the dream is trustworthy. Prediction error must be routed back as a training signal, so the model improves fastest exactly where it is most exploited. And final grades must always come from the environment, never from the simulator — a rule we apply as strictly to ourselves as to our agents.

Done honestly, the loop compounds: better models make imagination more faithful, more faithful imagination makes planning more ambitious, and more ambitious plans probe the model at its weakest points, which improves it again. Done dishonestly — graded inside the dream — the same loop produces systems that are impressive precisely until they touch the world.

We are firmly in the first camp. Imagination is our lever; reality is our referee.