LeWorldModel (LeWM)

The theory-and-training cleanup layer for JEPA. Introduces a simpler, more stable training objective and removes many heuristics that previous JEPAs relied on.

LeWorldModel Architecture

Core idea

The first JEPA that trains stably end-to-end from raw pixels using only two loss terms:

Next-embedding prediction loss (MSE)
SIGReg regularizer — enforces Gaussian-distributed latent embeddings

No stop-gradient, no exponential moving averages, no pre-trained encoders, no auxiliary supervision.

Key contributions

Reduces tunable loss hyperparameters from 6 to 1 compared to the only existing end-to-end alternative
15M parameters, trainable on a single GPU in a few hours
Plans up to 48x faster than foundation-model-based world models
Latent space encodes meaningful physical structure (probing of physical quantities)
Surprise evaluation: reliably detects physically implausible events

SIGReg (Sketched-Isotropic-Gaussian Regularizer)

The key anti-collapse mechanism. Projects embeddings onto random unit-norm directions and applies univariate normality tests (Epps-Pulley statistic) to each projection. Encourages the full embedding distribution to match an isotropic Gaussian. Simple, scalable, stable.

Significance in the JEPA timeline

Makes JEPA more principled and easier to train. While V-JEPA 2 achieves impressive results with complex training recipes, LeWM shows you can get competitive results with a minimal objective. This is important for adoption and understanding.

LeWorldModel (LeWM)

Core idea

Key contributions

SIGReg (Sketched-Isotropic-Gaussian Regularizer)

Significance in the JEPA timeline

Links

See also