Point-JEPA

Adapts JEPA specifically to point cloud data. Avoids raw-space reconstruction and shows that JEPA can work efficiently on geometric representations.

Core idea

Introduces a sequencer that orders point cloud patch embeddings to efficiently compute and utilize their proximity based on indices during target and context selection. The sequencer also enables shared computations between context and target selection, improving efficiency.

Key design choices

No reconstruction in input space: predictions happen entirely in latent space, following the JEPA principle
No additional modalities required: unlike some 3D SSL methods that need images or text
Proximity-based masking: the sequencer enables spatial-aware context/target selection for point clouds

Results

93.7% classification accuracy (linear SVM on ModelNet40) — surpasses all other self-supervised models
State-of-the-art across all four few-shot learning evaluation frameworks
Code: github.com/Ayumu-J-S/Point-JEPA

Significance in the JEPA timeline

One of the key 3D branches. Proves JEPA is not vision-only — it generalizes to geometric representations. Together with 3D-JEPA, establishes JEPA in the 3D domain.

Point-JEPA

Core idea

Key design choices

Results

Significance in the JEPA timeline

Links

See also