Reading

When AI Becomes an Advisor: A Comparative Study of Large Language Models and Human Advice in Digital Health Consultation

A joint research team from the University of Toronto and Harvard University published a paper at CHI 2026. Through a two-part study involving 210 participants, they found that advice generated by GPT-4o significantly outperforms top-voted human advice on Reddit in terms of effectiveness, warmth, and willingness to seek advice again, providing important insights for the design of AI-driven health consultation systems.

生成式引擎优化大语言模型AI建议数字健康人机协作CHI 2026GPT-4o众包算法策展

Published 2026-04-13 08:00Recent activity 2026-04-16 23:54Estimated read 7 min

When AI Becomes an Advisor: A Comparative Study of Large Language Models and Human Advice in Digital Health Consultation

Section 01

[Introduction] AI Advice vs. Human Advice: CHI 2026 Study Reveals GPT-4o's Comprehensive Advantages in Digital Health Consultation

A joint research team from the University of Toronto and Harvard University published a paper at CHI 2026. Through a two-part study involving 210 participants, they found that health advice generated by GPT-4o significantly outperforms top-voted human advice on Reddit in terms of effectiveness, warmth, and willingness to seek advice again. Additionally, they explored a new model of algorithmic curation for human-AI collaboration, providing important insights for the design of AI-driven health consultation systems.

Section 02

Research Background and Motivation: Can AI Advice Surpass Human Wisdom?

Seeking advice is one of the core human behaviors reshaped by the Internet. From early forum communities to today's Q&A platforms, the web has always served as a platform for crowdsourced public guidance. With the rise of large language models (LLMs), the way advice is obtained has undergone a second transformation—people are now turning directly to AI for life guidance. However, key questions remain: What is the quality of LLM-generated advice? Especially in highly personal and emotional scenarios like daily well-being, can AI advice match or even surpass human wisdom? This study aims to systematically answer this question.

Section 03

Research Design and Methods: Two-Part Study + 210 Participants

The research team designed two complementary studies, recruiting a total of 210 participants. In the first study, experts conducted a blind comparison between top-voted human comments on Reddit and LLM-generated advice. The second study explored the possibility of algorithmic curation—how to organically combine human and AI advice. The research scenarios focused on daily well-being issues, covering common consultation areas such as interpersonal relationships, career development, and mental health, ensuring practicality and reference value for real-world scenarios.

Section 04

Key Findings: GPT-4o Outperforms Across Key Metrics

Effectiveness

Expert evaluations show that GPT-4o-generated advice is more structured and actionable, avoiding subjective assumptions and emotional expressions in human advice, and performing better in problem-solving.

Warmth

GPT-4o demonstrates a nuanced understanding of human emotions, using appropriate language to convey empathy—outperforming some human commenters who were cold or direct due to bias or inappropriate expression.

Willingness to Seek Again

Participants were more willing to seek advice from AI again, reflecting AI's potential in building long-term user trust and satisfaction.

Model Comparison

GPT-4o outperformed GPT-5 in all metrics, except for a slight disadvantage in the 'flattery' dimension. This suggests that improvements in benchmark tests do not necessarily translate directly into practical application advantages, and advice generation requires specialized optimization.

Section 05

Algorithmic Curation: Potential of a Hybrid Model for Human-AI Collaboration

The second study found that human comments can be 'polished' to a level competitive with AI-generated content through algorithmic curation. This indicates that the future advice ecosystem does not have to be an either-or choice; a hybrid model can be built: AI provides structured, high-quality initial advice, and human experts review, supplement, and add emotional refinement.

Section 06

Implications for GEO and AI Search

New Standards for Content Quality: After AI becomes the main channel for information, content needs to meet 'dual optimization' for both human readers and AI systems.
Credibility and Transparency: Displaying content sources, basis, and limitations helps build user trust.
Advantages of Structured Content: The effectiveness of AI advice stems from its structure and actionability. Creators need to use clear structures and provide executable steps to enhance AI visibility.

Section 07

Research Limitations and Future Directions

Limitations: Focused on the field of well-being advice, so generalizability needs to be verified; AI advice has the risk of 'flattery' by over-catering to users. Future directions: Incorporate multiple evaluation dimensions such as long-term effect tracking and real user satisfaction surveys.

Section 08

Conclusion: AI Redefines the Way Advice is Obtained

This study depicts a picture of AI redefining the way advice is obtained. GPT-4o's comprehensive victory verifies the practical value of LLMs, providing confidence for AI applications in fields such as digital health consultation and online education. For GEO and AI search professionals, optimization in the AI era needs to combine technical capabilities with an understanding of human needs to build an intelligent advice ecosystem that truly benefits users.