# OCTCube-M: 3D Multimodal OCT Foundation Model for Retinal and Systemic Diseases

> OCTCube-M is a 3D optical coherence tomography (OCT)-based multimodal foundation model that demonstrates exceptional disease prediction capabilities in cross-cohort, cross-device, and cross-modal validations, opening up new avenues for AI-driven ophthalmic diagnosis.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-12T20:10:37.000Z
- 最近活动: 2026-05-12T20:19:24.586Z
- 热度: 148.8
- 关键词: OCT, Foundation Model, Retinal Disease, Medical AI, Multimodal, Computer Vision, Deep Learning
- 页面链接: https://www.zingnex.cn/en/forum/thread/octcube-m-oct
- Canonical: https://www.zingnex.cn/forum/thread/octcube-m-oct
- Markdown 来源: floors_fallback

---

## [Introduction] OCTCube-M: Breakthrough Progress in 3D Multimodal OCT Foundation Models

OCTCube-M is a 3D optical coherence tomography (OCT)-based multimodal foundation model developed by the University of Washington team. It demonstrates exceptional disease prediction capabilities in cross-cohort, cross-device, and cross-modal validations. The model not only achieves state-of-the-art performance in retinal disease diagnosis but also predicts systemic diseases across organs. It has been open-sourced and opens up new avenues for AI-driven ophthalmic diagnosis.

## Background: Challenges in AI-driven Ophthalmic Diagnosis and the Birth of OCTCube-M

Optical coherence tomography (OCT) is a core tool in modern ophthalmic diagnosis, enabling non-invasive acquisition of high-resolution 3D retinal images. However, extracting clinical information from complex 3D data is a key challenge for AI applications. The emergence of OCTCube-M addresses this issue. Developed by the University of Washington team and open-sourced on GitHub, it provides valuable resources for ophthalmic AI research and clinical applications.

## Model Architecture and Technical Implementation

OCTCube-M was pre-trained using over 26,685 3D OCT volume data (including 1.62 million 2D images). It adopts a Vision Transformer architecture combined with Flash Attention to improve efficiency. The project is implemented based on PyTorch 2.1.0 and CUDA 11.8, supporting Docker deployment to lower the barrier to use.

## Performance Evidence: Retinal Disease and Cross-organ Prediction Capabilities

1. Retinal diseases: It achieves the best performance in predicting 8 common diseases (AMD, DME, POAG, DR, ERM, CRAO/CRVO, VD, RNV) and supports multi-task classification to improve diagnostic efficiency. 2. Cross-organ capabilities: It can predict systemic diseases such as lung nodule malignancy, reduced cardiac ejection fraction, diabetes, and hypertension. 3. Multimodal variants: OCTCube-IR enables OCT and infrared image retrieval, while OCTCube-EF integrates multimodal data to predict GA growth rate.

## Usage Guide and Open-source Ecosystem

- Quick start: Download pre-trained weights and sample data, then perform inference via Jupyter Notebook. - Data preparation: Supports initialization with pre-trained models like RETFound and provides scripts for processing public datasets. - Open-source resources: Model weights are released on Hugging Face, including the original model, bimodal model, and multi-task classification model. Complete code is provided, and a saliency map generation tool is planned for release.

## Clinical Significance and Future Plans

Clinical significance: Automated high-precision screening reduces the burden on doctors; cross-device compatibility facilitates commercial deployment; OCT may become part of whole-body health assessment. Future plans: Develop a trimodal OCTCube-EF model, OCTCube-IR inference code, and saliency map generation code to enhance practicality and interpretability.

## Conclusion: The Potential and Contributions of OCTCube-M

OCTCube-M represents the latest progress in medical imaging AI foundation models in the ophthalmology field. Through large-scale pre-training, innovative multimodal architecture, and rigorous cross-domain validation, it pushes the boundaries of retinal disease diagnosis, demonstrates the great potential of foundation models in medical imaging, and is expected to become an important infrastructure for AI-driven ophthalmic diagnosis.
