# Communication Dilemmas of Medical AI: A Study on Empathy, Readability, and Alignment of Clinical Large Language Models

> This paper reveals the gaps between clinical large language models (LLMs) and doctors in emotional polarity and language complexity through multi-dimensional evaluation, and finds that collaborative rewriting rather than direct generation is the optimal application method for LLMs in medical scenarios.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-22T17:17:27.000Z
- 最近活动: 2026-04-23T23:26:33.875Z
- 热度: 120.8
- 关键词: 医疗AI, 大语言模型, 医患沟通, 可读性, 情感分析, 临床决策, 人机协作, 健康素养
- 页面链接: https://www.zingnex.cn/en/forum/thread/ai-llm-74504152
- Canonical: https://www.zingnex.cn/forum/thread/ai-llm-74504152
- Markdown 来源: floors_fallback

---

## [Introduction] Study on Communication Dilemmas of Medical AI: Gaps in Clinical LLMs and the Value of Collaborative Rewriting

Large Language Models (LLMs) are widely used in the medical field, but questions such as whether their communication style aligns with clinical standards and whether they can balance medical accuracy and emotional resonance have not been fully answered. This study compares AI and doctor communication samples through a three-dimensional evaluation of semantic fidelity, readability, and emotional resonance, and finds that clinical LLMs have gaps such as amplified emotional polarity and uncontrolled language complexity. Collaborative rewriting (AI optimizing doctors' existing content) is the optimal application method, with the core view that AI should serve as a communication enhancer for doctors rather than a replacement.

## Research Background and Motivation

Medical communication requires doctors to have medical knowledge, language conversion ability, and emotional perception. LLMs have shown strong knowledge levels and are being explored for scenarios such as automatic generation of diagnostic explanations, but existing applications mostly focus on information accuracy and lack systematic evaluation of communication quality (language complexity, emotional tone, semantic fidelity). This study fills the gap by comparing general LLMs, medical-specific LLMs, and real doctor samples to quantify the communication characteristics and limitations of AI.

## Evaluation Framework and Methodology

**Three-Dimensional Evaluation Framework**: 
1. **Semantic Fidelity**: Ensure medical accuracy and completeness through embedding space similarity + expert manual evaluation; 
2. **Readability**: Use the Flesch-Kincaid Grade Level (FKGL) indicator, with the ideal level being 6-8th grade (US standard); 
3. **Emotional Resonance**: Quantify text emotional polarity/intensity and compare with doctor samples. 
**Evaluation Objects**: General LLMs (e.g., GPT-5, Claude series), medical fine-tuned models, prompt engineering variants, comparing real doctors' written answers and doctor-patient dialogue transcripts.

## Key Findings: Communication Differences Between AI and Doctors

1. **Amplified Emotional Polarity**: The proportion of negative expressions in models (43.14%-45.10%) is higher than that of doctors (37.25%), due to training data bias and failure to master doctors' emotional balance ability; 
2. **Uncontrolled Language Complexity**: The FKGL score of large models (16.91-17.60) is far higher than that of doctors (11.47-12.50), requiring college-level reading ability, which is not conducive to patient understanding; 
3. **Limited Effect of Empathy Prompts**: Can reduce extreme negativity and complexity, but does not improve semantic fidelity.

## Breakthrough Performance of Collaborative Rewriting

**Advantages of Rewriting Mode**: When rewriting doctors' drafts/reference texts, the average semantic similarity reaches 0.93, while improving readability and reducing emotional extremeness, combining human professional judgment with AI language optimization capabilities; 
**Dual Evaluation**: Medical experts believe that doctors' content has the best epistemological quality, while patients prefer the rewritten version (clearer and emotionally appropriate).

## Practical Implications: AI as a Communication Enhancer

**Collaborative Mode**: Doctors lead core medical content, AI is responsible for language simplification, structure optimization, emotional calibration, and multilingual adaptation; 
**Risk Management**: Directly generated content has risks of emotional anxiety, understanding bias, and information omission, requiring human supervision and review, and AI content is only a draft.

## Technical Improvements and Ethical Considerations

**Technical Directions**: 
1. Readability constraint training (incorporate FKGL into reward functions); 
2. Emotional calibration module (post-processing to adjust tone); 
3. Domain-adaptive rewriting (distinguish between key medical information and auxiliary descriptions); 
**Ethical Considerations**: 
1. Health Equity: Avoid complex language exacerbating information inequality; 
2. Emotional Boundaries: Transparent communication strategies without hidden guidance; 
3. Responsibility Attribution: Clarify the doctor-led model and define responsibilities.
