RESEARCH DIRECTION

Spatial &
Embodied Intelligence

Language models are eloquent but unsituated — fluent describers of a world they have never inhabited. We hold that spatial understanding is not an application of intelligence but a constituent of it: representing scenes with geometry and physics, simulating how they evolve, and closing the loop between perceiving a world and acting within it.

From words to worlds

Text is a lossy projection of reality. Systems trained only on that projection inherit its gaps — no occlusion, no object permanence, no contact dynamics. We work on representations that recover what language discards: metric structure, spatial relations, and the physical constraints that make some described worlds possible and others nonsense.

Reconstruction, generation, simulation

A spatially intelligent system should be able to reconstruct a scene from partial observation, generate coherent extensions of it, and simulate its evolution under intervention — three capabilities that share one underlying world representation. Multimodal world models that unify these are, in our view, a load-bearing component of general intelligence.

The perception-action loop

Embodiment is the forcing function for grounded semantics: an agent that must act on its representations discovers immediately which of them are wrong. We study closed perception-action loops — real and simulated — as both a training signal and the most honest evaluation an intelligent system can face.

WORKING PRINCIPLES

How we hold this work to account.

Words are not worlds

Fluency about reality is not contact with it.

Geometry is load-bearing

Space and physics constrain what intelligence must represent.

Interaction is the exam

Acting in a world grades understanding of it.

CONTINUE EXPLORING

More research directions.

ALL RESEARCH