Zing Forum

Reading

AI Diagnosis of Breast Ultrasound Images: Application of Vision-Language Models in Medical Image Classification and Lesion Localization

This article introduces a research project on breast ultrasound image classification and lesion localization based on Vision-Language Models (VLMs), with detailed analysis of its technical scheme, experimental design, and application prospects in medical AI.

医学影像AI视觉语言模型乳腺超声病灶定位少样本学习CLIPSAM深度学习
Published 2026-06-01 13:38Recent activity 2026-06-01 13:52Estimated read 8 min
AI Diagnosis of Breast Ultrasound Images: Application of Vision-Language Models in Medical Image Classification and Lesion Localization
1

Section 01

[Introduction] AI Diagnosis of Breast Ultrasound: Research on Classification and Lesion Localization Using Vision-Language Models

This article introduces the busi-vlm-localisation project released by the University of Technology Sydney team on GitHub on June 1, 2026. Based on Vision-Language Models (VLMs), this project explores breast ultrasound image classification and lesion localization. Key contents include: using the BUSI dataset, adopting a two-stage framework (evaluating CLIP series models in the classification stage and using BUSSAM in the localization stage), conducting few-shot learning experiments. It aims to address the challenges of image interpretation in breast ultrasound diagnosis and provide technical references and open-source resources for medical AI-assisted diagnosis.

2

Section 02

Research Background and Clinical Significance

Breast cancer is the most common malignant tumor among women worldwide, and early detection is crucial. Breast ultrasound plays an important role in screening due to its advantages such as non-invasiveness and no radiation, but it faces challenges like image quality affected by operators, overlapping lesion features, high cost of physician training, and uneven resource distribution. AI (especially deep learning) provides possibilities to solve these problems, while the application of general VLMs in the medical imaging field is still in the exploration stage. This project is carried out to fill this gap.

3

Section 03

Dataset and Technical Scheme Architecture

Dataset: Based on the BUSI dataset, which includes breast ultrasound images, three types of labels (benign/malignant/normal), and pixel-level annotations of lesions, serving as a benchmark dataset for medical imaging AI. Technical Architecture: Two-stage framework

  1. Classification stage: Evaluate general VLMs (OpenAI CLIP) and medical-specific VLMs (BiomedCLIP, UniMed-CLIP);
  2. Localization stage: Use BUSSAM (an adapted version of SAM for the breast ultrasound field) for precise lesion segmentation.
4

Section 04

Experimental Design and Methodology

Few-shot Classification Experiments

  • Setup: 1/2/4/8/16/32 samples per class, 10 repeated experiments, evaluating accuracy, AUC, etc.;
  • Fine-tuning strategies:
    • Linear probing: Freeze the pre-trained visual encoder, only train the top classification layer (5000 iterations);
    • LoRA fine-tuning: Parameter-efficient method, training epochs 100, batch size 8, gradient accumulation 4, early stopping patience 18, head learning rate 1e-3, adapter 1e-4, LoRA rank 16/Alpha32, Dropout 0.1. Lesion Localization Experiments: BUSSAM training configuration (epochs 20, batch size 8, learning rate 0.0005, backbone ViT-B, input size 256, etc.).
5

Section 05

Preprocessing and Technical Implementation Details

Preprocessing: Includes caliper artifact removal (automatic detection and removal) and annotation standardization processing to ensure data quality. Code Structure: Modular design with Jupyter Notebooks:

  1. 01-preprocessing.ipynb (data preprocessing);
  2. 02-prompt-ensembling.ipynb (prompt ensembling);
  3. 03-vlm-classification.ipynb (VLM classification experiments);
  4. 04-train-bussam.ipynb (BUSSAM training). Environment Configuration: Requires CUDA PyTorch, and configuration of API keys (in .env file) for Kaggle/Hugging Face/Azure OpenAI, etc.
6

Section 06

Research Findings and Key Insights

From the experimental design, we can infer:

  • The comparison between general VLMs and medical-specific VLMs can evaluate the medical transfer ability of general models and the value of domain pre-training;
  • Few-shot experiments (1-32 samples) explore the performance threshold in data-scarce scenarios;
  • The comparison between linear probing and LoRA fine-tuning provides strategic references for resource-limited scenarios.
7

Section 07

Limitations and Future Research Directions

Current Limitations: Only validated on the BUSI dataset, no systematic comparison with CNN baselines (ResNet, etc.), and zero-shot performance needs improvement. Future Directions: Validate on larger datasets (e.g., BUS-BRA), optimize prompts to improve zero-shot performance, compare with CNN baselines, and further optimize LoRA configurations.

8

Section 08

Application Prospects of Medical AI and Conclusion

Application Prospects: The technology verifies the applicability of VLMs in medical imaging, the methodology guides few-shot scenarios, open-source code lowers the research threshold, and the two-stage framework has clinical translation potential. Conclusion: This project is a typical paradigm for medical AI research. Although VLMs are in the early stage in the medical imaging field, their generalization and few-shot capabilities are of great significance for scenarios with scarce annotations. Open-source resources provide support for the development of the field, and future AI-assisted diagnosis will benefit more patients.