EB-JEPA: A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
arXiv2602.03604
Date2026-02-05
Modalitymulti
AuthorsBasile Terver, Randall Balestriero, Megi Dervishi, David Fan + 7 more
Tagslibrary, educational, practical, image, video, planning
SourceFull text
EB-JEPA
An open-source library that makes JEPA accessible for research and education. Provides modular, self-contained implementations of the entire JEPA pipeline — from image SSL to video prediction to action-conditioned world models — all trainable on a single GPU.
Three progressively complex examples
- Image representation learning: self-supervised JEPA on images. Achieves 91% probing accuracy on CIFAR-10.
- Video prediction: multi-step prediction in latent space on Moving MNIST. Demonstrates how image SSL principles scale to temporal modeling.
- Action-conditioned world model: learns to predict effects of control inputs. Achieves 97% planning success rate on Two Rooms navigation task.
Design principles
- Modular architecture: reusable components (encoders, predictors, regularizers, planners) that can be recombined
- Single-GPU training: each example runs in a few hours on one GPU
- Educational: clear documentation and code structure to teach JEPA principles
- Comprehensive ablations: reveals critical importance of each regularization component for preventing collapse
Why it matters
Production JEPA codebases (V-JEPA 2, etc.) are designed for large-scale training and are hard to navigate. EB-JEPA bridges the gap between theory and practice, providing a low barrier to entry for the JEPA framework.
Links
See also
- 2511.08544 (LeJEPA) — the theory EB-JEPA implements
- collapse-prevention — the ablations reveal which components matter
- training-recipes — minimal training setups