# RetinalGPT: Open Source of Retinal Clinical Dialogue Assistant Based on Large Vision-Language Models

> The research team from Arizona State University has open-sourced the RetinalGPT data construction pipeline. This project uses large vision-language models to generate multi-turn dialogue data aligned with clinical preferences for fundus images, supporting the processing and dialogue generation of multiple retinal datasets.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-20T23:45:29.000Z
- 最近活动: 2026-04-20T23:49:00.751Z
- 热度: 141.9
- 关键词: RetinalGPT, 大视觉语言模型, 视网膜影像, 医学AI, 对话数据构建, 眼底疾病筛查, 多模态模型, 临床偏好对齐
- 页面链接: https://www.zingnex.cn/en/forum/thread/retinalgpt
- Canonical: https://www.zingnex.cn/forum/thread/retinalgpt
- Markdown 来源: floors_fallback

---

## RetinalGPT Open Source: Data Construction Pipeline for Retinal Clinical Dialogue Assistant Based on Large Vision-Language Models

The research team from Arizona State University has open-sourced the RetinalGPT data construction pipeline. It uses large vision-language models to generate multi-turn dialogue data for retinal images aligned with clinical preferences, supporting the processing of multiple mainstream retinal datasets. The aim is to solve the problem that traditional AI-assisted diagnosis systems lack interactive capabilities and provide high-quality data for training clinical dialogue assistants.

## Project Background and Clinical Significance

Early screening and diagnosis of retinal diseases are crucial for preventing vision loss. Traditional AI-assisted diagnosis systems only output single classification results or segmentation masks, lacking the ability to interact with clinicians and making it difficult to explain diagnostic basis or answer follow-up questions. Large Vision-Language Models (LVLMs) have great potential in medical image understanding, but general models lack sufficient clinical knowledge and the ability to express professional terms. The RetinalGPT project was thus born, aiming to build a clinical preference dialogue assistant for retinal images. Through training with high-quality multi-turn dialogue datasets, the model can understand clinicians' questioning habits, diagnostic logic, and preference expressions.

## Technical Architecture and Core Design

The core innovation of RetinalGPT lies in data-level optimization:
1. **Description Builder**: Implements a unified description builder for mainstream retinal datasets such as APTOS, EyeQ, IDRID, MICCAI, Messidor, ODIR, RFMiD, and UK Biobank, converting heterogeneous annotations (disease labels, image quality scores, etc.) into unified natural language descriptions.
2. **Dialogue Generation Pipeline**: Provides two modes—script-first mode (custom generation scripts like ins_UK.py) and pipeline-first mode (unified entry run_conversation_pipeline.py, supporting standardized processing across datasets).
3. **Asynchronous API Calls**: The instruction_gen_async.py module implements asynchronous calls, supporting text-only/image-conditioned generation and batch processing to improve the efficiency of large-scale data generation.

## Data Output Format and Application Scenarios

The generated dialogue data is stored in JSONL format. Each record includes a unique identifier (id), image path (image), and multi-turn dialogue content (conversations). The output can be further merged, cleaned, aligned, or converted into nested JSON for model fine-tuning. The project focuses on data construction and dialogue generation, not a complete end-to-end training codebase, so it needs to be used with basic frameworks like LLaVA for model training.

## Environmental Dependencies and Deployment Recommendations

RetinalGPT is built based on the LLaVA v0 environment specifications. Recommended deployment process:
1. Configure the standard LLaVA runtime environment
2. Install additional dependencies for RetinalGPT
3. Create a Python 3.10 virtual environment using conda
4. Install project dependencies via requirements.txt
A layered dependency management strategy ensures compatibility with upstream projects and avoids redundant packaging of the LLaVA training stack.

## Application Value and Future Outlook

Open Source Value of RetinalGPT:
- **Standardized Data Processing**: The unified description builder lowers the threshold for multi-center research
- **Clinical Preference Alignment**: Simulates real interaction scenarios to train AI assistants more in line with clinical needs
- **Enhanced Interpretability**: Multi-turn dialogue improves the transparency and credibility of AI systems
- **Research Reproducibility**: The open-source pipeline supports experimental reproduction and improvement
In the future, it is expected to expand to other medical image modalities such as dermoscopy, pathological sections, and radiological images, promoting the transformation of medical AI from "black-box classifiers" to "interactive clinical assistants".