Zing Forum

Reading

Fundus-R1: A Knowledge-Aware Multimodal Large Model for Fundus Image Analysis Trained on Public Data

This article introduces the Fundus-R1 model, the first multimodal large model for fundus image analysis trained exclusively on public datasets. Using RAG to generate knowledge-aware reasoning chains and RLVR enhanced by process rewards, it outperforms general-purpose models on multiple benchmarks.

Fundus-R1眼底图像分析多模态大模型RAG强化学习医学AI公开数据训练知识感知推理
Published 2026-04-09 22:55Recent activity 2026-04-10 10:18Estimated read 4 min
Fundus-R1: A Knowledge-Aware Multimodal Large Model for Fundus Image Analysis Trained on Public Data
1

Section 01

[Introduction] Fundus-R1: The First Knowledge-Aware Multimodal Large Model for Fundus Images Trained on Public Data

This article introduces the Fundus-R1 model, the first multimodal large model for fundus image analysis trained exclusively on public datasets. Using RAG to generate knowledge-aware reasoning chains and RLVR enhanced by process rewards, it outperforms general-purpose models on multiple benchmarks. This model addresses the barrier of existing fundus MLLMs relying on internal data, providing a new path for the democratization of medical AI.

2

Section 02

[Background] Importance of Fundus Diagnosis and Data Barriers of Existing Methods

Fundus imaging is a core method for ophthalmic disease screening, but insufficient numbers of professional doctors lead to low coverage. Existing high-performance fundus MLLMs rely on internal datasets, hindering research reproducibility; only 94% of public datasets have image-level labels, and the lack of fine-grained annotations limits model training.

3

Section 03

[Methodology] Two Key Technical Innovations of Fundus-R1

  1. RAG-driven Reasoning Chain: Extract visual features → Retrieve medical knowledge base → Construct reasoning chain from features to diagnosis, providing interpretable basis and supervision signals; 2. Process Reward-Enhanced RLVR: Evaluate logical coherence and knowledge correctness of the reasoning chain, incentivizing the generation of rigorous and reliable diagnostic reports.
4

Section 04

[Evidence] Experimental Validation and Ablation Study Results

It significantly outperforms baselines like Qwen2.5-VL on three benchmarks: FunBench, Omni-Fundus, and GMAI-Fundus; ablation studies show that the combination of RAG and process rewards yields the best results, and even small knowledge bases can improve performance. The model has advantages in classification accuracy, reasoning rationality, and generalization ability.

5

Section 05

[Conclusion] Significance and Impact of Fundus-R1

It breaks the perception that "high performance relies on proprietary data", provides an open-source reproducible baseline to accelerate the progress of ophthalmic AI; promotes the democratization of medical AI, allowing more institutions to participate in research and development, benefiting a wider range of patient groups.

6

Section 06

[Future Directions] Limitations and Follow-up Research Plans

Limitations: Insufficient diversity of public data, gaps between reasoning chains and expert-level ones; Future directions: Expand the knowledge base to cover rare diseases, optimize reasoning chains through human-machine collaboration, and extend to modal analysis such as OCT and UWF.