Zing Forum

Reading

Can AI Be a Doctor? A Study on the Empathy, Readability, and Alignment of Clinical LLMs

The study conducts a multi-dimensional evaluation of the communication capabilities of general-purpose and domain-specific LLMs in medical scenarios, finding that base models exhibit amplified emotional polarity and excessively high language complexity. While empathy prompts can reduce negative emotions and reading difficulty, the collaborative rewriting strategy performs best in terms of semantic fidelity, readability, and emotional regulation. The study indicates that LLMs are more suitable as an enhancement tool for clinical communication rather than a replacement.

医疗AI临床LLM医患沟通共情能力可读性语义对齐人机协作医疗伦理患者体验
Published 2026-04-23 01:17Recent activity 2026-04-23 10:54Estimated read 7 min
Can AI Be a Doctor? A Study on the Empathy, Readability, and Alignment of Clinical LLMs
1

Section 01

[Introduction] Can AI Be a Doctor? Role Positioning and Capability Evaluation of Clinical LLMs

This article focuses on the core question of "Can AI become a doctor?" and conducts a multi-dimensional evaluation of the performance of general-purpose and medical-specific LLMs in doctor-patient communication scenarios. The study finds that current LLMs have issues such as amplified emotional polarity and excessively high language complexity, but these can be effectively optimized through collaborative rewriting strategies. The final conclusion points out: LLMs are more suitable as an enhancement tool for clinical communication rather than replacing doctors.

2

Section 02

Research Background: AI Penetrates the Medical Field, Communication Capabilities to Be Verified

Large language models are rapidly entering medical scenarios (symptom self-check, medication guidance, etc.), but the question of whether AI's communication with patients meets clinical standards has not been fully answered. This study aims to systematically evaluate the performance of general-purpose and medical-specific LLMs in real doctor-patient interactions and reveal their capability boundaries and limitations.

3

Section 03

Evaluation Framework: Three-Dimensional Alignment Analysis

The study uses a three-dimensional evaluation system to measure the alignment of AI with clinical standards:

  1. Semantic Fidelity: Accuracy of medical facts and correctness of information transmission;
  2. Readability: Whether the text complexity is suitable for patient understanding (balancing professionalism and accessibility);
  3. Emotional Resonance: Appropriateness of emotional polarity and empathetic expression (conveying information while providing emotional support).
4

Section 04

Baseline Findings and Effectiveness of Optimization Strategies

Baseline Issues:

  • Amplified emotional polarity: The proportion of negative emotions in base models (43.14%-45.10%) is higher than that of doctors (37.25%), which easily exacerbates patient anxiety;
  • Excessively high language complexity: The FKGL (Flesch-Kincaid Grade Level) of GPT-5/Claude outputs reaches 16.91-17.60 (graduate level), while doctors' responses are at 11.47-12.50 (senior high school level). Optimization Strategies:
  • Empathy prompts: Reduce extreme negative emotions and lower reading difficulty, but no significant improvement in semantic fidelity;
  • Collaborative rewriting (restatement mode): Highest semantic similarity (average 0.93), improved readability, and effective control of emotional extremes.
5

Section 05

Evaluation from the Perspective of Two Stakeholders

Doctor's Perspective: AI is inferior to real doctors in medical accuracy, clinical reasoning, and rationality of diagnosis and treatment suggestions, emphasizing the auxiliary positioning of AI; Patient's Perspective: Prefer rewritten AI responses more, considering their clarity and emotional tone more satisfying.

6

Section 06

Core Conclusion: LLMs Are Clinical Communication Enhancers Rather Than Replacements

Based on comprehensive research findings, the most effective role of LLMs in medical scenarios is a collaborative communication enhancer rather than a replacement for clinical professional knowledge. In terms of functional positioning, AI should focus on improving the quality and efficiency of communication rather than replacing doctors' diagnostic decision-making power; the ideal model is that doctors provide professional judgments, and AI assists in optimizing the expression method.

7

Section 07

Practical Implications: Guiding Directions for Medical AI Development and Deployment

Recommendations for developers and deployers:

  1. Layered Processing: Separate content generation (ensuring accuracy) from expression optimization (adjusting style and emotion);
  2. Readability First: Control the density of terms, organize information clearly, and provide background explanations;
  3. Emotional Calibration: Automatically detect and adjust overly emotional expressions, and adjust empathy according to the context;
  4. Continuous Supervision: Maintain doctors' final review authority in key decisions and position AI as an auxiliary tool.
8

Section 08

Research Limitations and Future Directions

Limitations:

  • Only based on text analysis, not involving multi-modal interactions (e.g., voice, facial expressions);
  • Samples are from English contexts, cross-language applicability to be verified;
  • The impact on long-term doctor-patient relationships has not been fully investigated. Future Directions:
  • Develop medical-specific alignment training methods;
  • Explore consistency in multi-turn dialogue contexts;
  • Study the actual impact of AI-assisted communication on medical outcomes.