Zing Forum

Reading

Communication Dilemmas of Medical AI: A Study on Empathy, Readability, and Alignment of Clinical Large Language Models

This paper reveals the gaps between clinical large language models (LLMs) and doctors in emotional polarity and language complexity through multi-dimensional evaluation, and finds that collaborative rewriting rather than direct generation is the optimal application method for LLMs in medical scenarios.

医疗AI大语言模型医患沟通可读性情感分析临床决策人机协作健康素养
Published 2026-04-23 01:17Recent activity 2026-04-24 07:26Estimated read 7 min
Communication Dilemmas of Medical AI: A Study on Empathy, Readability, and Alignment of Clinical Large Language Models
1

Section 01

[Introduction] Study on Communication Dilemmas of Medical AI: Gaps in Clinical LLMs and the Value of Collaborative Rewriting

Large Language Models (LLMs) are widely used in the medical field, but questions such as whether their communication style aligns with clinical standards and whether they can balance medical accuracy and emotional resonance have not been fully answered. This study compares AI and doctor communication samples through a three-dimensional evaluation of semantic fidelity, readability, and emotional resonance, and finds that clinical LLMs have gaps such as amplified emotional polarity and uncontrolled language complexity. Collaborative rewriting (AI optimizing doctors' existing content) is the optimal application method, with the core view that AI should serve as a communication enhancer for doctors rather than a replacement.

2

Section 02

Research Background and Motivation

Medical communication requires doctors to have medical knowledge, language conversion ability, and emotional perception. LLMs have shown strong knowledge levels and are being explored for scenarios such as automatic generation of diagnostic explanations, but existing applications mostly focus on information accuracy and lack systematic evaluation of communication quality (language complexity, emotional tone, semantic fidelity). This study fills the gap by comparing general LLMs, medical-specific LLMs, and real doctor samples to quantify the communication characteristics and limitations of AI.

3

Section 03

Evaluation Framework and Methodology

Three-Dimensional Evaluation Framework:

  1. Semantic Fidelity: Ensure medical accuracy and completeness through embedding space similarity + expert manual evaluation;
  2. Readability: Use the Flesch-Kincaid Grade Level (FKGL) indicator, with the ideal level being 6-8th grade (US standard);
  3. Emotional Resonance: Quantify text emotional polarity/intensity and compare with doctor samples. Evaluation Objects: General LLMs (e.g., GPT-5, Claude series), medical fine-tuned models, prompt engineering variants, comparing real doctors' written answers and doctor-patient dialogue transcripts.
4

Section 04

Key Findings: Communication Differences Between AI and Doctors

  1. Amplified Emotional Polarity: The proportion of negative expressions in models (43.14%-45.10%) is higher than that of doctors (37.25%), due to training data bias and failure to master doctors' emotional balance ability;
  2. Uncontrolled Language Complexity: The FKGL score of large models (16.91-17.60) is far higher than that of doctors (11.47-12.50), requiring college-level reading ability, which is not conducive to patient understanding;
  3. Limited Effect of Empathy Prompts: Can reduce extreme negativity and complexity, but does not improve semantic fidelity.
5

Section 05

Breakthrough Performance of Collaborative Rewriting

Advantages of Rewriting Mode: When rewriting doctors' drafts/reference texts, the average semantic similarity reaches 0.93, while improving readability and reducing emotional extremeness, combining human professional judgment with AI language optimization capabilities; Dual Evaluation: Medical experts believe that doctors' content has the best epistemological quality, while patients prefer the rewritten version (clearer and emotionally appropriate).

6

Section 06

Practical Implications: AI as a Communication Enhancer

Collaborative Mode: Doctors lead core medical content, AI is responsible for language simplification, structure optimization, emotional calibration, and multilingual adaptation; Risk Management: Directly generated content has risks of emotional anxiety, understanding bias, and information omission, requiring human supervision and review, and AI content is only a draft.

7

Section 07

Technical Improvements and Ethical Considerations

Technical Directions:

  1. Readability constraint training (incorporate FKGL into reward functions);
  2. Emotional calibration module (post-processing to adjust tone);
  3. Domain-adaptive rewriting (distinguish between key medical information and auxiliary descriptions); Ethical Considerations:
  4. Health Equity: Avoid complex language exacerbating information inequality;
  5. Emotional Boundaries: Transparent communication strategies without hidden guidance;
  6. Responsibility Attribution: Clarify the doctor-led model and define responsibilities.