RESEARCH DIRECTION

Self-Supervised Representation Learning

Humans and animals acquire common sense from observation, not annotation. We study self-supervised objectives — predictive and joint-embedding formulations — that let systems learn multi-level representations of the world directly from unlabeled streams, predicting in representation space rather than pixel space, at multiple time horizons simultaneously.

Prediction in representation space

Pixel-perfect prediction wastes capacity on irrelevant detail — the flutter of every leaf. Joint-embedding predictive architectures sidestep this by predicting abstract representations of future states rather than the states themselves, with energy-based formulations handling the irreducible uncertainty. The model learns what is predictable and represents the rest as latent variables, not noise to be painted.

Hierarchies of abstraction

Reasoning at one timescale is a pathology. We build representation stacks in which each level abstracts and predicts over a longer horizon than the one below — millisecond dynamics at the bottom, task and goal structure at the top — so that planning can decompose long-horizon problems into short-horizon subgoals natively.

Labels as scaffolding, not foundation

Supervised objectives remain useful for evaluation and steering, but a system whose representations depend on labeled data inherits the coverage limits of its annotators. Our foundations are trained on prediction; supervision is applied sparingly, at the top, where it is cheapest and most meaningful.

WORKING PRINCIPLES

How we hold this work to account.

Abstraction over pixels

Predict the representation, not the rendering.

Multiple horizons

Every level of the hierarchy owns its own timescale.

Observation is the curriculum

The world supplies more supervision than any dataset.

CONTINUE EXPLORING

More research directions.

World Models & Latent Imagination

Learning compressed generative models of environment dynamics — and planning inside them before acting in the world.

Neurosymbolic Reasoning

Hybrid architectures that combine learned representations with explicit symbol manipulation and verifiable inference.

Spatial & Embodied Intelligence

Grounding intelligence beyond language: geometric scene understanding, simulation, and perception-action loops.

Intrinsic Motivation & Open-Ended Learning

Curiosity as compression progress: agents that generate their own curricula and allocate compute to their frontier.

Grounding & Calibration

Closed-loop evaluation against reality: held-out prediction, proper scoring, and confidence that means something.

Systems & Cognitive Architecture

Modular architectures — perception, world model, memory, critic, actor — engineered as dependable, measurable systems.

Alignment & Safety

Bounded agency, calibrated honesty, and oversight designed into the architecture — not appended to it.

ALL RESEARCH