Zing Forum

Reading

Neuro-JEPA: A Foundation Model for Sparse Latent Variable Prediction in Multimodal Neuroimaging

The NYU Medical Machine Learning Lab open-sourced Neuro-JEPA, applying the JEPA architecture to neuroimaging analysis and enabling self-supervised learning of multimodal brain images via sparse latent variable prediction.

Neuro-JEPA神经影像自监督学习JEPA多模态稀疏表征医学影像脑影像深度学习表征学习
Published 2026-06-13 05:58Recent activity 2026-06-13 06:21Estimated read 8 min
Neuro-JEPA: A Foundation Model for Sparse Latent Variable Prediction in Multimodal Neuroimaging
1

Section 01

[Introduction] Neuro-JEPA: Open-Source Foundation Model for Sparse Latent Variable Prediction in Multimodal Neuroimaging

The NYU Medical Machine Learning Lab (NYUMedML) open-sourced Neuro-JEPA on GitHub on June 12, 2026, applying the JEPA (Joint Embedding Predictive Architecture) to neuroimaging analysis and enabling self-supervised learning of multimodal brain images via sparse latent variable prediction. Optimized for the characteristics of neuroimaging, this model supports multimodal data such as MRI, fMRI, and PET, aiming to address issues like scarce labeled data and difficult modal alignment, and provides high-quality representations for downstream tasks such as brain region segmentation and disease classification.

2

Section 02

[Background] Challenges in Neuroimaging Analysis and Introduction of the JEPA Architecture

Neuroimaging faces challenges such as scarce labeled data, difficult inter-modal alignment, and high-dimensional data processing. Traditional supervised learning relies on large amounts of manual annotation, which is costly and requires the participation of professional physicians. Self-supervised learning (SSL) learns representations from unlabeled data through pre-training tasks, providing ideas to solve these dilemmas. The JEPA architecture (e.g., I-JEPA, V-JEPA) has been successful in the computer vision field; its core is to predict latent space representations rather than pixels, avoiding the limitations of pixel-level reconstruction (such as redundant details and high computational overhead) and being more suitable for learning semantic features. Neuro-JEPA introduces this concept into the neuroimaging field and optimizes it.

3

Section 03

[Methodology] Architectural Design of Neuro-JEPA and Sparse Latent Variable Prediction Mechanism

Neuro-JEPA is optimized for the characteristics of neuroimaging: 1. 3D Volume Processing: Uses 3D patch division and attention mechanisms to capture cross-slice anatomical correlations; 2. Multimodal Fusion: Learns cross-modal shared features through a modality-agnostic representation space; 3. Sparse Latent Variable Prediction: Core innovation, predicting sparse latent variable activations to enhance interpretability, improve efficiency, and boost generalization (achieved via L1 regularization or gating mechanisms); 4. Anatomical Structure Awareness: Introduces anatomical priors (e.g., brain region segmentation maps) to learn meaningful representations. The sparsity constraint formula is: L = ||z_t - Decoder(h)||² + λ||h||₁ (λ controls the degree of sparsity).

4

Section 04

[Evidence] Experimental Validation: Performance of Neuro-JEPA in Downstream Tasks

Neuro-JEPA was pre-trained on large-scale datasets such as ADNI, UK Biobank, and ABCD. Performance in downstream tasks: 1. Brain Region Segmentation: Dice coefficient in FreeSurfer tasks improved by 3-5% compared to MAE and over 15% compared to random initialization; 2. Disease Diagnosis: AUC for Alzheimer's disease classification on ADNI reached 0.92, outperforming existing self-supervised methods; 3. Cross-Modal Transfer: Models pre-trained on structural MRI still performed well when transferred to fMRI tasks. Ablation experiments confirmed the importance of sparsity constraints (performance dropped by 5% after removal), 3D processing (superior to 2D slices), and multimodal pre-training (superior to unimodal).

5

Section 05

[Applications & Open Source] Downstream Task Applications and Open-Source Resources of Neuro-JEPA

Downstream application scenarios include brain region segmentation, disease classification, image registration, and generative tasks (e.g., missing modality completion). The open-source codebase is modularly designed, including data processing (supports NIfTI/CIFTI formats), model implementation (PyTorch-based 3D ViT + sparse predictor), pre-training scripts, and downstream task examples. Pre-trained weights for ADNI/UK Biobank are provided, and users can quickly use them following steps: environment preparation → data processing → pre-training/loading weights → fine-tuning.

6

Section 06

[Contributions & Outlook] Innovations of Neuro-JEPA and Future Research Directions

Innovations: 1. First systematic application of JEPA to neuroimaging; 2. Proposed sparse latent variable prediction mechanism; 3. Achieved unified multimodal representation. Limitations: High computational resource requirements, insufficient handling of data heterogeneity, and limited coverage of downstream tasks. Future directions: Cross-dataset pre-training, temporal modeling (capturing disease progression), clinical data fusion (imaging + genomics/cognitive tests), and development of interpretability enhancement tools.

7

Section 07

[Summary] Significance of Neuro-JEPA for Self-Supervised Learning in Neuroimaging

By combining the JEPA architecture with sparse latent variable prediction, Neuro-JEPA enables high-quality multimodal brain image representation learning and provides a practical tool for neuroimaging analysis. The release of open-source code and pre-trained models is expected to promote more researchers to conduct follow-up studies and accelerate the clinical implementation of neuroimaging AI.