Reading

PrMed: A Perturbation-Resilient Medical Large Model for Real-World Healthcare Scenarios

PrMed is a medical foundation model specifically designed to address the non-standard expression characteristics of patients in real-world healthcare scenarios. Through two-stage training on 1.2 million multi-source medical samples, it achieves strong robustness against language perturbations such as colloquialism, emotional expression, and dialectal variations.

医学AI大语言模型抗扰动医患对话临床部署QwenGRPO多智能体

Published 2026-04-14 00:32Recent activity 2026-04-14 00:49Estimated read 9 min

Section 01

PrMed: A Perturbation-Resilient Medical Large Model for Real-World Healthcare Scenarios (Introduction)

PrMed is a medical foundation model designed for non-standard patient expressions in real-world healthcare scenarios. Its core goal is to solve the performance gap of existing medical large models in clinical deployment caused by language perturbations. Trained on 1.2 million multi-source medical samples via two-stage training (LoRA supervised fine-tuning + GRPO reinforcement learning), it achieves strong robustness against language perturbations such as colloquialism, emotional expression, and dialectal variations. When converting from standardized language to heavily perturbed expressions, its accuracy drops by only 2.71 percentage points, far outperforming mainstream models.

Section 02

Background: Challenges of Language Perturbations in Real-World Healthcare Scenarios

Large language models perform well in medical benchmark tests but fall short in clinical deployment. The core reason is the mismatch between training data and real scenarios—existing models are trained on standardized corpora, while real patient expressions are full of language perturbations. The team from the Chinese Academy of Medical Sciences analyzed 569,913 Chinese online consultation records and found that 95.1% of patient utterances contain at least one perturbation, and 83.6% contain two or more, including colloquialism, dialects, emotional expression, incomplete grammar, subjective misdiagnosis, etc., revealing the fundamental challenges in the actual deployment of current medical AI.

Section 03

Core Design Philosophy of PrMed

PrMed (Perturbation-Resilient Medicine) focuses on maintaining stable reasoning capabilities in noisy real doctor-patient dialogues. Its design philosophy is 'finding order in chaos'—not eliminating non-standardization, but understanding and adapting to it. This shift in thinking allows PrMed's accuracy to drop by only 2.71 percentage points when facing language transitions, outperforming other mainstream models.

Section 04

Technical Architecture and Training Strategy

PrMed is based on the Qwen3-32B architecture and adopts two-stage training:

LoRA Supervised Fine-Tuning (SFT): Trained on corpora containing perturbation-resilient reasoning chains, each data entry includes five steps of structured reasoning: emotion perception, perturbation detection, expression correction, chief complaint extraction, and medical reasoning;
GRPO Reinforcement Learning: Optimizes perturbation response strategies through interactive training with patient simulators. The training data consists of 1.2 million entries, covering multi-source data such as Chinese online consultations, English medical dialogues, verifiable medical Q&A, medical question banks, and internal hospital records, all screened via 13-dimensional scoring to ensure quality.

Section 05

Perturbation Classification System: A Standardized Framework of 4 Categories and 12 Subcategories

The research team established a perturbation classification system with 4 categories and 12 subcategories, providing a standardized analysis framework for medical NLP:

Structural category: Perspective misalignment, incomplete grammar, reversed expression order;
Formal category: Internet slang, dialectal expressions, spelling/input errors;
Emotional category: Positive, negative, and repressed emotional interference;
Contextual category: Subjective misdiagnosis, irrelevant information insertion, vague and uncertain expressions. Fine-grained classification allows PrMed to handle different language variations in a targeted manner instead of treating them as vague 'noise'.

Section 06

Multi-Agent Data Construction Pipeline

PrMed uses multi-agent collaboration to build high-quality data, including three pipelines:

Perturbation Annotation Pipeline: Three agents (DeepSeek-V3 initial annotation, Qwen3-235B-A22B review, GPT-5.1 arbitration for disputes) mimic manual multi-round verification, with efficiency exceeding manual work;
Reasoning Chain Generation Pipeline: A generate-evaluate-refine cycle—generators produce five-step reasoning, scoring agents evaluate from 13 dimensions and three levels, and non-compliant samples receive feedback for iterative optimization (up to three rounds);
Perturbation Synthesis Pipeline: A four-agent architecture that synthesizes perturbation samples of different severity levels based on real data distribution, used for stress testing and capability boundary exploration.

Section 07

Clinical Significance and Deployment Plan

PrMed adapts to clinical deployment needs:

Multiple usage methods: Python API calls, vLLM + Open WebUI complete web interface, supporting bilingual consultation in Chinese and English;
Easy deployment: vLLM service supports OpenAI-compatible API, seamlessly integrating into existing medical information systems;
Privacy and security: Open-source (Apache 2.0 license), allowing local deployment to ensure patient data privacy.

Section 08

Limitations and Future Outlook

Limitations of PrMed: Currently, it mainly targets language-level perturbations; its ability to integrate multi-modal data (images, test reports) needs to be enhanced; performance on extremely rare diseases requires more clinical validation. The team has publicly released the model weights, data construction pipeline, and perturbation classification system to facilitate community verification, improvement, and domain standardization. In the future, they will combine multi-modal technology with more clinical data, extend the perturbation-resilient concept to broader medical AI applications, and achieve the leap from the laboratory to the bedside.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15