Prediction in representation space
Pixel-perfect prediction wastes capacity on irrelevant detail — the flutter of every leaf. Joint-embedding predictive architectures sidestep this by predicting abstract representations of future states rather than the states themselves, with energy-based formulations handling the irreducible uncertainty. The model learns what is predictable and represents the rest as latent variables, not noise to be painted.
