Zing Forum

Reading

PrMed: A Perturbation-Resilient Medical Large Model for Real-World Healthcare Scenarios

PrMed is a medical foundation model specifically designed to address the non-standard expression characteristics of patients in real-world healthcare scenarios. Through two-stage training on 1.2 million multi-source medical samples, it achieves strong robustness against language perturbations such as colloquialism, emotional expression, and dialectal variations.

医学AI大语言模型抗扰动医患对话临床部署QwenGRPO多智能体
Published 2026-04-14 00:32Recent activity 2026-04-14 00:49Estimated read 9 min
PrMed: A Perturbation-Resilient Medical Large Model for Real-World Healthcare Scenarios
1

Section 01

PrMed: A Perturbation-Resilient Medical Large Model for Real-World Healthcare Scenarios (Introduction)

PrMed is a medical foundation model designed for non-standard patient expressions in real-world healthcare scenarios. Its core goal is to solve the performance gap of existing medical large models in clinical deployment caused by language perturbations. Trained on 1.2 million multi-source medical samples via two-stage training (LoRA supervised fine-tuning + GRPO reinforcement learning), it achieves strong robustness against language perturbations such as colloquialism, emotional expression, and dialectal variations. When converting from standardized language to heavily perturbed expressions, its accuracy drops by only 2.71 percentage points, far outperforming mainstream models.

2

Section 02

Background: Challenges of Language Perturbations in Real-World Healthcare Scenarios

Large language models perform well in medical benchmark tests but fall short in clinical deployment. The core reason is the mismatch between training data and real scenarios—existing models are trained on standardized corpora, while real patient expressions are full of language perturbations. The team from the Chinese Academy of Medical Sciences analyzed 569,913 Chinese online consultation records and found that 95.1% of patient utterances contain at least one perturbation, and 83.6% contain two or more, including colloquialism, dialects, emotional expression, incomplete grammar, subjective misdiagnosis, etc., revealing the fundamental challenges in the actual deployment of current medical AI.

3

Section 03

Core Design Philosophy of PrMed

PrMed (Perturbation-Resilient Medicine) focuses on maintaining stable reasoning capabilities in noisy real doctor-patient dialogues. Its design philosophy is 'finding order in chaos'—not eliminating non-standardization, but understanding and adapting to it. This shift in thinking allows PrMed's accuracy to drop by only 2.71 percentage points when facing language transitions, outperforming other mainstream models.

4

Section 04

Technical Architecture and Training Strategy

PrMed is based on the Qwen3-32B architecture and adopts two-stage training:

  1. LoRA Supervised Fine-Tuning (SFT): Trained on corpora containing perturbation-resilient reasoning chains, each data entry includes five steps of structured reasoning: emotion perception, perturbation detection, expression correction, chief complaint extraction, and medical reasoning;
  2. GRPO Reinforcement Learning: Optimizes perturbation response strategies through interactive training with patient simulators. The training data consists of 1.2 million entries, covering multi-source data such as Chinese online consultations, English medical dialogues, verifiable medical Q&A, medical question banks, and internal hospital records, all screened via 13-dimensional scoring to ensure quality.
5

Section 05

Perturbation Classification System: A Standardized Framework of 4 Categories and 12 Subcategories

The research team established a perturbation classification system with 4 categories and 12 subcategories, providing a standardized analysis framework for medical NLP:

  • Structural category: Perspective misalignment, incomplete grammar, reversed expression order;
  • Formal category: Internet slang, dialectal expressions, spelling/input errors;
  • Emotional category: Positive, negative, and repressed emotional interference;
  • Contextual category: Subjective misdiagnosis, irrelevant information insertion, vague and uncertain expressions. Fine-grained classification allows PrMed to handle different language variations in a targeted manner instead of treating them as vague 'noise'.
6

Section 06

Multi-Agent Data Construction Pipeline

PrMed uses multi-agent collaboration to build high-quality data, including three pipelines:

  1. Perturbation Annotation Pipeline: Three agents (DeepSeek-V3 initial annotation, Qwen3-235B-A22B review, GPT-5.1 arbitration for disputes) mimic manual multi-round verification, with efficiency exceeding manual work;
  2. Reasoning Chain Generation Pipeline: A generate-evaluate-refine cycle—generators produce five-step reasoning, scoring agents evaluate from 13 dimensions and three levels, and non-compliant samples receive feedback for iterative optimization (up to three rounds);
  3. Perturbation Synthesis Pipeline: A four-agent architecture that synthesizes perturbation samples of different severity levels based on real data distribution, used for stress testing and capability boundary exploration.
7

Section 07

Clinical Significance and Deployment Plan

PrMed adapts to clinical deployment needs:

  • Multiple usage methods: Python API calls, vLLM + Open WebUI complete web interface, supporting bilingual consultation in Chinese and English;
  • Easy deployment: vLLM service supports OpenAI-compatible API, seamlessly integrating into existing medical information systems;
  • Privacy and security: Open-source (Apache 2.0 license), allowing local deployment to ensure patient data privacy.
8

Section 08

Limitations and Future Outlook

Limitations of PrMed: Currently, it mainly targets language-level perturbations; its ability to integrate multi-modal data (images, test reports) needs to be enhanced; performance on extremely rare diseases requires more clinical validation. The team has publicly released the model weights, data construction pipeline, and perturbation classification system to facilitate community verification, improvement, and domain standardization. In the future, they will combine multi-modal technology with more clinical data, extend the perturbation-resilient concept to broader medical AI applications, and achieve the leap from the laboratory to the bedside.