# Can AI Be a Doctor? A Study on the Empathy, Readability, and Alignment of Clinical LLMs

> The study conducts a multi-dimensional evaluation of the communication capabilities of general-purpose and domain-specific LLMs in medical scenarios, finding that base models exhibit amplified emotional polarity and excessively high language complexity. While empathy prompts can reduce negative emotions and reading difficulty, the collaborative rewriting strategy performs best in terms of semantic fidelity, readability, and emotional regulation. The study indicates that LLMs are more suitable as an enhancement tool for clinical communication rather than a replacement.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-22T17:17:27.000Z
- 最近活动: 2026-04-23T02:54:07.954Z
- 热度: 152.4
- 关键词: 医疗AI, 临床LLM, 医患沟通, 共情能力, 可读性, 语义对齐, 人机协作, 医疗伦理, 患者体验
- 页面链接: https://www.zingnex.cn/en/forum/thread/ai-llm-74504152
- Canonical: https://www.zingnex.cn/forum/thread/ai-llm-74504152
- Markdown 来源: floors_fallback

---

## [Introduction] Can AI Be a Doctor? Role Positioning and Capability Evaluation of Clinical LLMs

This article focuses on the core question of "Can AI become a doctor?" and conducts a multi-dimensional evaluation of the performance of general-purpose and medical-specific LLMs in doctor-patient communication scenarios. The study finds that current LLMs have issues such as amplified emotional polarity and excessively high language complexity, but these can be effectively optimized through collaborative rewriting strategies. The final conclusion points out: LLMs are more suitable as an enhancement tool for clinical communication rather than replacing doctors.

## Research Background: AI Penetrates the Medical Field, Communication Capabilities to Be Verified

Large language models are rapidly entering medical scenarios (symptom self-check, medication guidance, etc.), but the question of whether AI's communication with patients meets clinical standards has not been fully answered. This study aims to systematically evaluate the performance of general-purpose and medical-specific LLMs in real doctor-patient interactions and reveal their capability boundaries and limitations.

## Evaluation Framework: Three-Dimensional Alignment Analysis

The study uses a three-dimensional evaluation system to measure the alignment of AI with clinical standards:
1. **Semantic Fidelity**: Accuracy of medical facts and correctness of information transmission;
2. **Readability**: Whether the text complexity is suitable for patient understanding (balancing professionalism and accessibility);
3. **Emotional Resonance**: Appropriateness of emotional polarity and empathetic expression (conveying information while providing emotional support).

## Baseline Findings and Effectiveness of Optimization Strategies

**Baseline Issues**:
- Amplified emotional polarity: The proportion of negative emotions in base models (43.14%-45.10%) is higher than that of doctors (37.25%), which easily exacerbates patient anxiety;
- Excessively high language complexity: The FKGL (Flesch-Kincaid Grade Level) of GPT-5/Claude outputs reaches 16.91-17.60 (graduate level), while doctors' responses are at 11.47-12.50 (senior high school level).
**Optimization Strategies**:
- Empathy prompts: Reduce extreme negative emotions and lower reading difficulty, but no significant improvement in semantic fidelity;
- Collaborative rewriting (restatement mode): Highest semantic similarity (average 0.93), improved readability, and effective control of emotional extremes.

## Evaluation from the Perspective of Two Stakeholders

**Doctor's Perspective**: AI is inferior to real doctors in medical accuracy, clinical reasoning, and rationality of diagnosis and treatment suggestions, emphasizing the auxiliary positioning of AI;
**Patient's Perspective**: Prefer rewritten AI responses more, considering their clarity and emotional tone more satisfying.

## Core Conclusion: LLMs Are Clinical Communication Enhancers Rather Than Replacements

Based on comprehensive research findings, the most effective role of LLMs in medical scenarios is a **collaborative communication enhancer** rather than a replacement for clinical professional knowledge. In terms of functional positioning, AI should focus on improving the quality and efficiency of communication rather than replacing doctors' diagnostic decision-making power; the ideal model is that doctors provide professional judgments, and AI assists in optimizing the expression method.

## Practical Implications: Guiding Directions for Medical AI Development and Deployment

Recommendations for developers and deployers:
1. **Layered Processing**: Separate content generation (ensuring accuracy) from expression optimization (adjusting style and emotion);
2. **Readability First**: Control the density of terms, organize information clearly, and provide background explanations;
3. **Emotional Calibration**: Automatically detect and adjust overly emotional expressions, and adjust empathy according to the context;
4. **Continuous Supervision**: Maintain doctors' final review authority in key decisions and position AI as an auxiliary tool.

## Research Limitations and Future Directions

**Limitations**:
- Only based on text analysis, not involving multi-modal interactions (e.g., voice, facial expressions);
- Samples are from English contexts, cross-language applicability to be verified;
- The impact on long-term doctor-patient relationships has not been fully investigated.
**Future Directions**:
- Develop medical-specific alignment training methods;
- Explore consistency in multi-turn dialogue contexts;
- Study the actual impact of AI-assisted communication on medical outcomes.
