# ProMedical: Hierarchical Fine-Grained Standard Alignment for Medical Large Models via Explicit Injection

> This article introduces the ProMedical framework, which achieves a 22.3% increase in accuracy and a 21.7% improvement in safety compliance on Qwen3-8B by constructing a fine-grained clinical standard dataset and an explicit standard injection paradigm, and training a multi-dimensional reward model to separate safety and capability.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-09T14:57:33.000Z
- 最近活动: 2026-04-10T02:47:17.007Z
- 热度: 135.2
- 关键词: 医疗大模型, 模型对齐, 强化学习, 多维奖励模型, AI安全, 临床标准
- 页面链接: https://www.zingnex.cn/en/forum/thread/promedical
- Canonical: https://www.zingnex.cn/forum/thread/promedical
- Markdown 来源: floors_fallback

---

## [Introduction] ProMedical Framework: An Innovative Path for Hierarchical Fine-Grained Standard Alignment of Medical Large Models

This article introduces the ProMedical framework, which addresses the core challenges of limited coarse-grained preference signals and the entanglement of safety and capability in medical AI alignment. By constructing a fine-grained clinical standard dataset and an explicit standard injection paradigm, and training a multi-dimensional reward model to separate safety and capability, it achieves a 22.3% increase in accuracy and a 21.7% improvement in safety compliance on the Qwen3-8B base model.

## [Background] Unique Challenges in Medical AI Alignment

Medical AI alignment faces two core issues: 1. Limitations of coarse-grained preference signals: Traditional RLHF/DPO rely on binary preference judgments, losing key details in medical scenarios and failing to capture the multi-dimensional trade-off between diagnostic accuracy and safety; 2. Entanglement of safety and capability: Scalar reward models compress multiple dimensions into a single value, leading the model to either sacrifice safety for capability or become overly conservative (reducing practicality), and making debugging and intervention difficult.

## [Methodology] ProMedical-Preference-50k: A Physician-Driven Fine-Grained Dataset

Constructing the human-machine collaborative fine-grained clinical standard dataset ProMedical-Preference-50k: 1. Annotation process: The model generates candidate responses, which are evaluated by physicians based on multi-dimensional clinical standards such as diagnostic accuracy, treatment rationality, and safety; 2. Fine-grained scoring: Each sample is accompanied by detailed multi-dimensional scores instead of simple good/bad judgments, providing the model with rich clinical dimension information.

## [Methodology] Explicit Standard Injection Paradigm: Multi-Dimensional Reward Model Design

Proposing an explicit standard injection paradigm to train the ProMedical-RM multi-dimensional reward model: 1. Dimension decoupling architecture: Outputs a multi-dimensional score vector to separate the optimization of safety and professional capability; 2. Dynamic weight adjustment: Explicitly informs the weights of each dimension during training, which can be flexibly adjusted according to scenarios (emergency/chronic disease); 3. GRPO precise guidance: Multi-dimensional reward signals help the model improve performance in each dimension in a targeted manner.

## [Evidence] Evaluation and Experimental Results: Dual Improvements in Accuracy and Safety

Validating the effect through ProMedical-Bench double-blind expert evaluation: 1. Double-blind mechanism: Anonymous scoring by experts eliminates brand bias; 2. Experimental results: Qwen3-8B achieves a 22.3% increase in accuracy and a 21.7% improvement in safety compliance, comparable to top closed-source models, and demonstrates excellent generalization ability on the external benchmark UltraMedical.

## [Conclusion] Open-Source Contributions and Framework Value

The ProMedical framework achieves collaborative optimization of safety and capability, and its open-source dataset, reward model, and evaluation benchmark have important values: 1. Ensures reproducibility and supports medical AI safety research; 2. Provides a complete toolchain to promote the upgrade of multi-dimensional evaluation standards in the industry; 3. Proves the potential of open-source medical AI and accelerates the popularization of safe medical intelligent systems.

## [Outlook] Technical Insights and Future Directions

ProMedical provides methodological insights for AI alignment in high-risk fields: 1. Fine-grained modeling is the key to reliable alignment; 2. Explicit separation of multi-dimensional goals provides a path for controllable optimization of complex systems; 3. Human-machine collaborative data construction will become a standard practice in professional fields. In the future, it can be further extended to other high-risk AI application scenarios.
