JEPAwiki
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
Date2024-04-25
Modalitypoint-cloud
AuthorsAyumu Saito, Prachi Kudeshia, Jiju Poovvancheri
Tags3D, point-cloud, self-supervised-learning, few-shot
SourceFull text

Point-JEPA

Adapts JEPA specifically to point cloud data. Avoids raw-space reconstruction and shows that JEPA can work efficiently on geometric representations.

Core idea

Introduces a sequencer that orders point cloud patch embeddings to efficiently compute and utilize their proximity based on indices during target and context selection. The sequencer also enables shared computations between context and target selection, improving efficiency.

Key design choices

  • No reconstruction in input space: predictions happen entirely in latent space, following the JEPA principle
  • No additional modalities required: unlike some 3D SSL methods that need images or text
  • Proximity-based masking: the sequencer enables spatial-aware context/target selection for point clouds

Results

  • 93.7% classification accuracy (linear SVM on ModelNet40) — surpasses all other self-supervised models
  • State-of-the-art across all four few-shot learning evaluation frameworks
  • Code: github.com/Ayumu-J-S/Point-JEPA

Significance in the JEPA timeline

One of the key 3D branches. Proves JEPA is not vision-only — it generalizes to geometric representations. Together with 3D-JEPA, establishes JEPA in the 3D domain.

Links

See also