LeWorldModel (LeWM)
The theory-and-training cleanup layer for JEPA. Introduces a simpler, more stable training objective and removes many heuristics that previous JEPAs relied on.
Core idea
The first JEPA that trains stably end-to-end from raw pixels using only two loss terms:
- Next-embedding prediction loss (MSE)
- SIGReg regularizer — enforces Gaussian-distributed latent embeddings
No stop-gradient, no exponential moving averages, no pre-trained encoders, no auxiliary supervision.
Key contributions
- Reduces tunable loss hyperparameters from 6 to 1 compared to the only existing end-to-end alternative
- 15M parameters, trainable on a single GPU in a few hours
- Plans up to 48x faster than foundation-model-based world models
- Latent space encodes meaningful physical structure (probing of physical quantities)
- Surprise evaluation: reliably detects physically implausible events
SIGReg (Sketched-Isotropic-Gaussian Regularizer)
The key anti-collapse mechanism. Projects embeddings onto random unit-norm directions and applies univariate normality tests (Epps-Pulley statistic) to each projection. Encourages the full embedding distribution to match an isotropic Gaussian. Simple, scalable, stable.
Significance in the JEPA timeline
Makes JEPA more principled and easier to train. While V-JEPA 2 achieves impressive results with complex training recipes, LeWM shows you can get competitive results with a minimal objective. This is important for adoption and understanding.
Links
See also
- 2506.09985 (V-JEPA 2) — the complex recipe LeWM simplifies
- collapse-prevention — SIGReg as the core solution
- 2603.14482 (V-JEPA 2.1) — alternative approach, more complex but higher performance