RESEARCH DIRECTION

Alignment &
Safety

Capability without alignment is a liability. We treat safety as an architectural discipline: agents whose scope of action is explicit and bounded, whose uncertainty is calibrated and surfaced, whose reasoning is inspectable, and whose deployment is gated on demonstrated understanding rather than demonstrated performance.

Honesty as a design objective

A system that overstates its confidence, hides its uncertainty, or optimizes appearances over substance is unsafe long before it is powerful. We build anti-deception measures into evaluation itself — proper scoring, adversarial probes, and audits that make confident nonsense costly — and apply the same standard to our own research claims.

Bounded by construction

We prefer agents whose action space, resource envelope, and escalation paths are architectural facts rather than behavioral hopes. Permissions, audit trails, and fail-safes are designed in from the first commit; oversight is real, and a human can always reconstruct what the system did and why.

Deferral under uncertainty

A well-calibrated system knows where its competence ends — and the safe response at that boundary is deferral, not improvisation. We study uncertainty-aware decision policies that trade autonomy for oversight exactly when the model's own error bars say they should.

WORKING PRINCIPLES

How we hold this work to account.

No confident nonsense

Systems should say less when they know less.

Bounded by construction

Safety properties are architectural, not behavioral.

Earn each step

New capability deploys only when it is understood.

CONTINUE EXPLORING

More research directions.

ALL RESEARCH