JEPAwiki
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Date2025-11-13
Modalityimage
AuthorsRandall Balestriero, Yann LeCun
Tagstheory, SIGReg, foundational, heuristic-free, scalable
SourceFull text

LeJEPA

The theoretical foundation paper for the JEPA family. Provides comprehensive theory for JEPAs, identifies the optimal embedding distribution, and introduces the SIGReg regularizer that eliminates all training heuristics.

Core contributions

1. Theoretical result: isotropic Gaussian is optimal

Proves that the isotropic Gaussian N(0, I) is the optimal distribution that JEPA embeddings should follow to minimize downstream prediction risk. This is not an arbitrary choice — it's the mathematically optimal target.

2. SIGReg (Sketched Isotropic Gaussian Regularization)

A novel regularizer that constrains embeddings to reach the ideal distribution. See collapse-prevention for full mathematical details. Key properties:

  • Single trade-off hyperparameter (λ)
  • Linear time and memory complexity
  • Stable across hyperparameters, architectures (ResNets, ViTs, ConvNets), and domains
  • Heuristics-free: no stop-gradient, no teacher-student, no schedulers
  • Distributed-training-friendly, implementable in ~50 lines of code

3. LeJEPA = JEPA predictive loss + SIGReg

The simplest possible JEPA training recipe: just two loss terms. Everything else (EMA, stop-gradient, momentum scheduling, multi-term losses) is unnecessary.

Results

  • 79% on ImageNet-1K with ViT-H/14 (linear evaluation, frozen backbone)
  • Validated across 10+ datasets, 60+ architectures, varying scales and domains
  • Domain-specific pretraining consistently outperforms frontier foundation models' transfer learning (demonstrated on Galaxy10 astrophysics dataset)

Relationship to LeWorldModel

LeJEPA provides the theory; LeWorldModel applies it to world models. LeWorldModel uses SIGReg to train stably end-to-end from pixels with only 2 loss terms, achieving 48x faster planning than foundation-model-based world models.

Significance in the JEPA timeline

The theory-and-training cleanup layer. By identifying the optimal embedding distribution and providing a simple regularizer, LeJEPA removed the ad-hoc nature of JEPA training and made it principled, scalable, and accessible.

Links

See also