RESEARCH DIRECTION

Spatial &
Embodied Intelligence

Language models are eloquent but unsituated — fluent describers of a world they have never inhabited. We hold that spatial understanding is not an application of intelligence but a constituent of it: representing scenes with geometry and physics, simulating how they evolve, and closing the loop between perceiving a world and acting within it.

From words to worlds

Text is a lossy projection of reality. Systems trained only on that projection inherit its gaps — no occlusion, no object permanence, no contact dynamics. We work on representations that recover what language discards: metric structure, spatial relations, and the physical constraints that make some described worlds possible and others nonsense.

Reconstruction, generation, simulation

A spatially intelligent system should be able to reconstruct a scene from partial observation, generate coherent extensions of it, and simulate its evolution under intervention — three capabilities that share one underlying world representation. Multimodal world models that unify these are, in our view, a load-bearing component of general intelligence.

The perception-action loop

Embodiment is the forcing function for grounded semantics: an agent that must act on its representations discovers immediately which of them are wrong. We study closed perception-action loops — real and simulated — as both a training signal and the most honest evaluation an intelligent system can face.

WORKING PRINCIPLES

How we hold this work to account.

Words are not worlds

Fluency about reality is not contact with it.

Geometry is load-bearing

Space and physics constrain what intelligence must represent.

Interaction is the exam

Acting in a world grades understanding of it.

CONTINUE EXPLORING

More research directions.

World Models & Latent Imagination

Learning compressed generative models of environment dynamics — and planning inside them before acting in the world.

Self-Supervised Representation Learning

Joint-embedding predictive architectures that learn hierarchical abstractions from raw observation — without labels.

Neurosymbolic Reasoning

Hybrid architectures that combine learned representations with explicit symbol manipulation and verifiable inference.

Intrinsic Motivation & Open-Ended Learning

Curiosity as compression progress: agents that generate their own curricula and allocate compute to their frontier.

Grounding & Calibration

Closed-loop evaluation against reality: held-out prediction, proper scoring, and confidence that means something.

Systems & Cognitive Architecture

Modular architectures — perception, world model, memory, critic, actor — engineered as dependable, measurable systems.

Alignment & Safety

Bounded agency, calibrated honesty, and oversight designed into the architecture — not appended to it.

ALL RESEARCH