Reading

RetinalGPT: Open Source of Retinal Clinical Dialogue Assistant Based on Large Vision-Language Models

The research team from Arizona State University has open-sourced the RetinalGPT data construction pipeline. This project uses large vision-language models to generate multi-turn dialogue data aligned with clinical preferences for fundus images, supporting the processing and dialogue generation of multiple retinal datasets.

RetinalGPT大视觉语言模型视网膜影像医学AI对话数据构建眼底疾病筛查多模态模型临床偏好对齐

Published 2026-04-21 07:45Recent activity 2026-04-21 07:49Estimated read 7 min

RetinalGPT: Open Source of Retinal Clinical Dialogue Assistant Based on Large Vision-Language Models

Section 01

RetinalGPT Open Source: Data Construction Pipeline for Retinal Clinical Dialogue Assistant Based on Large Vision-Language Models

The research team from Arizona State University has open-sourced the RetinalGPT data construction pipeline. It uses large vision-language models to generate multi-turn dialogue data for retinal images aligned with clinical preferences, supporting the processing of multiple mainstream retinal datasets. The aim is to solve the problem that traditional AI-assisted diagnosis systems lack interactive capabilities and provide high-quality data for training clinical dialogue assistants.

Section 02

Project Background and Clinical Significance

Early screening and diagnosis of retinal diseases are crucial for preventing vision loss. Traditional AI-assisted diagnosis systems only output single classification results or segmentation masks, lacking the ability to interact with clinicians and making it difficult to explain diagnostic basis or answer follow-up questions. Large Vision-Language Models (LVLMs) have great potential in medical image understanding, but general models lack sufficient clinical knowledge and the ability to express professional terms. The RetinalGPT project was thus born, aiming to build a clinical preference dialogue assistant for retinal images. Through training with high-quality multi-turn dialogue datasets, the model can understand clinicians' questioning habits, diagnostic logic, and preference expressions.

Section 03

Technical Architecture and Core Design

The core innovation of RetinalGPT lies in data-level optimization:

Description Builder: Implements a unified description builder for mainstream retinal datasets such as APTOS, EyeQ, IDRID, MICCAI, Messidor, ODIR, RFMiD, and UK Biobank, converting heterogeneous annotations (disease labels, image quality scores, etc.) into unified natural language descriptions.
Dialogue Generation Pipeline: Provides two modes—script-first mode (custom generation scripts like ins_UK.py) and pipeline-first mode (unified entry run_conversation_pipeline.py, supporting standardized processing across datasets).
Asynchronous API Calls: The instruction_gen_async.py module implements asynchronous calls, supporting text-only/image-conditioned generation and batch processing to improve the efficiency of large-scale data generation.

Section 04

Data Output Format and Application Scenarios

The generated dialogue data is stored in JSONL format. Each record includes a unique identifier (id), image path (image), and multi-turn dialogue content (conversations). The output can be further merged, cleaned, aligned, or converted into nested JSON for model fine-tuning. The project focuses on data construction and dialogue generation, not a complete end-to-end training codebase, so it needs to be used with basic frameworks like LLaVA for model training.

Section 05

Environmental Dependencies and Deployment Recommendations

RetinalGPT is built based on the LLaVA v0 environment specifications. Recommended deployment process:

Configure the standard LLaVA runtime environment
Install additional dependencies for RetinalGPT
Create a Python 3.10 virtual environment using conda
Install project dependencies via requirements.txt A layered dependency management strategy ensures compatibility with upstream projects and avoids redundant packaging of the LLaVA training stack.

Section 06

Application Value and Future Outlook

Open Source Value of RetinalGPT:

Standardized Data Processing: The unified description builder lowers the threshold for multi-center research
Clinical Preference Alignment: Simulates real interaction scenarios to train AI assistants more in line with clinical needs
Enhanced Interpretability: Multi-turn dialogue improves the transparency and credibility of AI systems
Research Reproducibility: The open-source pipeline supports experimental reproduction and improvement In the future, it is expected to expand to other medical image modalities such as dermoscopy, pathological sections, and radiological images, promoting the transformation of medical AI from "black-box classifiers" to "interactive clinical assistants".

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49