JOURNAL — MAY 2026

Words are not worlds

Language models are eloquent about a world they have never touched. Grounding is not a garnish on intelligence — it is a prerequisite.

The most striking property of large language models is how far eloquence can get you. Trained on humanity's writing, they inherit humanity's descriptions — and descriptions, it turns out, cover an enormous fraction of what we ask intelligent systems to do.

But text is a projection of the world, and a lossy one. It rarely records what every writer took for granted: that unsupported objects fall, that things continue existing when occluded, that you cannot walk through the table between you and the door. A system trained purely on the projection inherits its gaps — eloquent about physics it has never felt, confident about spaces it has never navigated.

This is why we treat spatial and physical grounding as a constituent of intelligence rather than an application of it. A general system must represent scenes with geometry, predict their evolution under intervention, and — the strictest test — act within them and be corrected by the consequences. Perception-action loops grade understanding in a way no benchmark of text ever will.

Fluency about reality is not contact with reality.

It is also why world models sit at the center of our research program. A model that can reconstruct a scene from partial observation, simulate its dynamics, and support planning against those dynamics is doing something a description of the scene cannot: it is committing to how the world works, in a form reality can falsify.

None of this diminishes language, which remains the most efficient interface for abstract knowledge ever invented. It is an argument about foundations. Words are how intelligence communicates. Worlds are what intelligence is about. Systems that confuse the two will keep mistaking fluency for understanding — and we would rather build the kind that knows the difference.