JEPAwiki
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning
Date2026-03-17
Modalityvideo/image
AuthorsMatthew Muckley, Amir Bar, Mido Assran, Koustuv Sinha + 4 more
Tagsdense-features, video, image, robotics, self-supervised-learning, deep-self-supervision
SourceFull text

V-JEPA 2.1

The representation-quality upgrade to V-JEPA 2. Focuses on learning dense, high-quality representations that are spatially structured, semantically coherent, and temporally consistent.

Core idea

Four key ingredients:

  1. Dense Predictive Loss: all tokens (visible context AND masked) contribute to the training loss, encouraging explicit spatial and temporal grounding
  2. Deep Self-Supervision: applies the self-supervised objective hierarchically at multiple intermediate encoder layers
  3. Multi-Modal Tokenizers: support unified training over images and videos
  4. Model and data scaling: effective scaling strategies

Key results

  • 7.71 mAP on Ego4D (short-term object-interaction anticipation)
  • 40.8 Recall@5 on EPIC-KITCHENS (high-level action anticipation)
  • 20% improvement in real-robot grasping success rate over V-JEPA 2-AC
  • 5.687 ATE on Tartan Drive (robotic navigation)
  • 0.307 RMSE on NYUv2 depth estimation (linear probe)
  • 77.7% on Something-Something-V2 (global recognition)

Key difference from V-JEPA 2

V-JEPA 2 demonstrated the world-model capability. V-JEPA 2.1 upgrades feature quality — especially dense features — making them useful for spatial tasks like depth estimation, navigation, and grasping. The Dense Predictive Loss (all tokens participate) is the main innovation.

Significance in the JEPA timeline

Closes the gap between global scene understanding (what V-JEPA 2 excels at) and fine-grained spatial understanding (depth, object boundaries, dense correspondence).

Links

See also

  • 2506.09985 (V-JEPA 2) — the world-model predecessor
  • 2603.19312 (LeWorldModel) — alternative approach to stable JEPA training
  • collapse-prevention — deep self-supervision as collapse prevention