JOURNAL — MAY 2026

Generality is earned, not declared

A system is only as general as the most distant domain it has actually survived.

The word "general" gets used freely in this field. A system handles a few related tasks and generality is declared, as if the property could be conferred by press release. We hold ourselves to a narrower reading: a system is only as general as the most distant domain it has actually survived.

Distance matters. Transferring a skill between two neighboring problems is useful, but it is weak evidence of anything deeper. The interesting question is always the far transfer — whether what a system learned in one corner of the world helps it in a corner it has never seen, under rules it was never shown.

This is why we structure our research around held-out domains. Not held-out examples — held-out kinds of problems. If a capability only appears when the test looks like the training, we treat it as memorization wearing a costume.

Breadth you have not tested is breadth you do not have.

It also changes what we celebrate internally. Depth in a single domain is comfortable to build and easy to demonstrate, but it can be invisible progress toward generality — or no progress at all. Breadth, tested honestly, is harder to fake and harder to win. So breadth is what we count.

None of this makes for dramatic announcements. Earning generality looks like a long series of small, verified transfers, each one slightly farther from home than the last. But we would rather hold a modest claim that survives every test than a grand one that survives none.

Breadth you have not tested is breadth you do not have. We intend to keep testing.