Compressed spatio-temporal representation
Raw observation streams are too high-dimensional to plan over. We study how generative models can be trained self-supervised to distill perception into compact latent states that preserve exactly the structure needed for prediction and control — and discard the rest. Compression is not a convenience here; the quality of the latent code bounds the quality of every downstream decision.
