RESEARCH DIRECTION

Grounding &
Calibration

Intelligence that is not answerable to reality is just eloquence. Every capability we build is graded in a closed loop: predictions registered in advance, outcomes observed, skill measured against strong baselines under proper scoring rules — with leakage controls and held-out structure that make self-deception expensive.

Reality as the referee

It is easy for a system — or a team — to convince itself it is making progress. We design our evaluation so the world itself does the grading: forecasts are time-gated and pre-registered, test domains are held out by construction, and a gain only counts when it beats the strongest available baseline out-of-sample.

Calibration as a first-class property

Being right is not enough; a system must know how likely it is to be right. We measure calibration with proper scoring rules and reliability analysis, and we treat miscalibration as a defect on par with inaccuracy — because a confidently wrong system is more dangerous than an uncertain one.

Generalization measured by distance

In-distribution performance flatters every architecture. The measurements we optimize are transfer measurements: zero-shot performance on held-out domains, graded by structural distance from anything seen in training. Breadth that has not been tested is breadth that does not exist.

WORKING PRINCIPLES

How we hold this work to account.

Predict, then check

Claims about the world are settled by the world.

Proper scoring only

Confidence is graded with rules that reward honesty.

No self-grading

Progress is judged by external, held-out evidence.

CONTINUE EXPLORING

More research directions.

World Models & Latent Imagination

Learning compressed generative models of environment dynamics — and planning inside them before acting in the world.

Self-Supervised Representation Learning

Joint-embedding predictive architectures that learn hierarchical abstractions from raw observation — without labels.

Neurosymbolic Reasoning

Hybrid architectures that combine learned representations with explicit symbol manipulation and verifiable inference.

Spatial & Embodied Intelligence

Grounding intelligence beyond language: geometric scene understanding, simulation, and perception-action loops.

Intrinsic Motivation & Open-Ended Learning

Curiosity as compression progress: agents that generate their own curricula and allocate compute to their frontier.

Systems & Cognitive Architecture

Modular architectures — perception, world model, memory, critic, actor — engineered as dependable, measurable systems.

Alignment & Safety

Bounded agency, calibrated honesty, and oversight designed into the architecture — not appended to it.

ALL RESEARCH