Zing Forum

Reading

HealthGPT: A Large-Scale Multimodal Medical Model Unifying Medical Visual Understanding and Generation

The HealthGPT model proposed by the Zhejiang University team unifies medical image understanding and generation capabilities through heterogeneous knowledge adaptation technology, and has been recognized with a Spotlight at ICML 2025.

医学AI多模态模型视觉语言模型ICML医学影像图像生成浙江大学医疗大模型
Published 2026-05-08 05:41Recent activity 2026-05-08 10:07Estimated read 5 min
HealthGPT: A Large-Scale Multimodal Medical Model Unifying Medical Visual Understanding and Generation
1

Section 01

Introduction: HealthGPT—A Multimodal Medical Model Unifying Medical Visual Understanding and Generation

The Zhejiang University team proposed the HealthGPT model, which for the first time unifies medical image understanding and generation within a single framework using heterogeneous knowledge adaptation technology. This achievement has been recognized with a Spotlight at ICML 2025. HealthGPT addresses the resource waste and performance bottlenecks of traditional medical AI's separate design, providing an efficient multimodal solution for medical scenarios.

2

Section 02

Research Background and Challenges

In the field of medical AI, there are conflicting needs for understanding medical images and generating medical images. Traditional separate models cannot share knowledge, leading to resource waste and performance bottlenecks. How to integrate visual understanding and generation capabilities within a unified framework has become a key issue to be solved urgently.

3

Section 03

Core Technical Innovations: Heterogeneous Knowledge Adaptation and Unified Framework

Heterogeneous Knowledge Adaptation Mechanism

  • Cross-modal alignment: Establish precise mapping between visual features and medical concepts
  • Hierarchical knowledge fusion: Multi-level integration from pixel level to semantic level
  • Dynamic knowledge retrieval: Adaptive invocation of relevant knowledge

Unified Understanding-Generation Framework

Adopts a unified Transformer architecture, switches between dual tasks via task prompts and attention mechanisms, achieving knowledge sharing, improved data efficiency, and guaranteed semantic consistency.

Large-Scale Medical Pre-Training

Based on multimodal datasets such as X-rays and CT scans, uses a combination of contrastive learning and generative learning objectives for pre-training.

4

Section 04

Model Capabilities and Application Scenarios

Medical Image Understanding

  • Lesion detection and localization
  • Disease classification and diagnosis
  • Image report generation
  • Visual question answering

Medical Image Generation

  • Text-to-image synthesis
  • Image editing and restoration
  • Data augmentation
  • Multimodal conversion

Unified Interaction Interface

Supports natural language interaction, lowering the threshold for clinical use.

5

Section 05

Experimental Validation and Performance

  • Understanding tasks: Reaches or exceeds the level of specialized models in tasks such as classification and segmentation
  • Generation tasks: Image visual quality and medical accuracy reach clinically usable levels
  • Cross-task transfer: Improves few-shot learning performance through knowledge transfer
6

Section 06

Open-Source Contributions and Community Impact

The team open-sourced code, pre-trained weights, dataset tools, and documentation to promote the popularization of medical AI technology and assist researchers in building applications.

7

Section 07

Current Limitations and Future Directions

Limitations

  • Data privacy limits training scale
  • Generated images require clinical validation
  • Insufficient professional coverage

Future Directions

  • Federated learning for privacy-preserving training
  • Fine-grained knowledge injection
  • Deep fusion of multimodal data
  • Enhanced interpretability
8

Section 08

Summary and Outlook

HealthGPT is an important milestone in medical multimodal large models, and the ICML 2025 Spotlight recognition reflects academic attention. It is expected to play a key role in fields such as auxiliary diagnosis and medical education in the future, benefiting patients and medical workers.