EB-JEPA

An open-source library that makes JEPA accessible for research and education. Provides modular, self-contained implementations of the entire JEPA pipeline — from image SSL to video prediction to action-conditioned world models — all trainable on a single GPU.

Three progressively complex examples

Image representation learning: self-supervised JEPA on images. Achieves 91% probing accuracy on CIFAR-10.
Video prediction: multi-step prediction in latent space on Moving MNIST. Demonstrates how image SSL principles scale to temporal modeling.
Action-conditioned world model: learns to predict effects of control inputs. Achieves 97% planning success rate on Two Rooms navigation task.

Design principles

Modular architecture: reusable components (encoders, predictors, regularizers, planners) that can be recombined
Single-GPU training: each example runs in a few hours on one GPU
Educational: clear documentation and code structure to teach JEPA principles
Comprehensive ablations: reveals critical importance of each regularization component for preventing collapse

Why it matters

Production JEPA codebases (V-JEPA 2, etc.) are designed for large-scale training and are hard to navigate. EB-JEPA bridges the gap between theory and practice, providing a low barrier to entry for the JEPA framework.

EB-JEPA

Three progressively complex examples

Design principles

Why it matters

Links

See also