JEPAwiki
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Date2026-03-23
Modalityvideo/pixels
AuthorsLucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero
Tagsworld-model, stability, SIGReg, end-to-end, efficiency, planning
SourceFull text

LeWorldModel (LeWM)

The theory-and-training cleanup layer for JEPA. Introduces a simpler, more stable training objective and removes many heuristics that previous JEPAs relied on.

LeWorldModel Architecture

Core idea

The first JEPA that trains stably end-to-end from raw pixels using only two loss terms:

  1. Next-embedding prediction loss (MSE)
  2. SIGReg regularizer — enforces Gaussian-distributed latent embeddings

No stop-gradient, no exponential moving averages, no pre-trained encoders, no auxiliary supervision.

Key contributions

  • Reduces tunable loss hyperparameters from 6 to 1 compared to the only existing end-to-end alternative
  • 15M parameters, trainable on a single GPU in a few hours
  • Plans up to 48x faster than foundation-model-based world models
  • Latent space encodes meaningful physical structure (probing of physical quantities)
  • Surprise evaluation: reliably detects physically implausible events

SIGReg (Sketched-Isotropic-Gaussian Regularizer)

The key anti-collapse mechanism. Projects embeddings onto random unit-norm directions and applies univariate normality tests (Epps-Pulley statistic) to each projection. Encourages the full embedding distribution to match an isotropic Gaussian. Simple, scalable, stable.

Significance in the JEPA timeline

Makes JEPA more principled and easier to train. While V-JEPA 2 achieves impressive results with complex training recipes, LeWM shows you can get competitive results with a minimal objective. This is important for adoption and understanding.

Links

See also

  • 2506.09985 (V-JEPA 2) — the complex recipe LeWM simplifies
  • collapse-prevention — SIGReg as the core solution
  • 2603.14482 (V-JEPA 2.1) — alternative approach, more complex but higher performance