# Using LLMs to Analyze 'Sacred Cow Urine' Health Misinformation on Indian YouTube: How Cultural Confusion Deceives Algorithms and Humans

> A research team from the University of Michigan developed a discourse analysis framework based on large language models (LLMs) specifically for identifying and analyzing the hybrid rhetorical strategies in health promotion content about 'gomutra' (cow urine) on Indian YouTube. This project reveals how traditional cultural metaphors and pseudoscientific discourse intertwine to form a complex discourse system that poses challenges to LLMs trained primarily on Western corpora.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-21T17:15:05.000Z
- 最近活动: 2026-04-21T17:21:34.979Z
- 热度: 150.9
- 关键词: 健康谣言, 大语言模型, 话语分析, 文化混淆, YouTube, 内容审核, 多语言处理, 计算社会科学
- 页面链接: https://www.zingnex.cn/en/forum/thread/llmyoutube
- Canonical: https://www.zingnex.cn/forum/thread/llmyoutube
- Markdown 来源: floors_fallback

---

## [Introduction] Using LLMs to Analyze 'Sacred Cow Urine' Health Misinformation on Indian YouTube: How Cultural Confusion Challenges Algorithms and Humans

A research team from the University of Michigan developed a discourse analysis framework based on large language models (LLMs) to study the hybrid rhetorical strategies in health promotion content about 'gomutra' (cow urine) on Indian YouTube. This project reveals the complex system formed by the interweaving of traditional cultural metaphors and pseudoscientific discourse, which poses challenges to LLMs trained primarily on Western corpora and traditional content moderation mechanisms, providing a new perspective for understanding health misinformation rooted in cultural confusion.

## Research Background: Intertwining of Traditional Culture and Health Misinformation, and Moderation Challenges

In India, cow urine (gomutra) is regarded by some groups as a traditional substance with sacred healing properties. In recent years, when spread through platforms like YouTube, religious discourse and modern health science terminology are conflated, forming a phenomenon of 'cultural confusion'. Traditional moderation methods based on keywords or simple semantic analysis struggle to identify such content that appears as 'cultural expression' but actually spreads unsubstantiated health claims; moreover, the content is often a mix of English, Hindi, and Urdu, further increasing the complexity of automated analysis.

## Research Design: Multi-Stage Discourse Analysis Framework Assisted by LLMs

The study constructed a post-hoc analysis framework to evaluate the limitations of mainstream LLMs in handling culturally confused content. The steps include: 1. Sample collection: 30 multilingual videos (both promotional and debunking content); 2. Audio transcription: Using the OpenAI Whisper large model, with manual proofreading of 16% of samples, achieving an average word error rate of 7.04%; 3. Term extraction: GPT-4o identifies traditional cultural metaphors (religious symbols, traditional medicine concepts) and scientific terms (chemical components, etc.); 4. Intensity word analysis: Gemini, GPT-4o-mini, and DeepSeek extract emphasis words under zero/few-shot, formal/friendly tone conditions, and Cohen's Kappa coefficient is calculated to evaluate consistency.

## Key Findings: Systemic Limitations of Mainstream LLMs in Handling Culturally Confused Content

1. Western-centric training corpora lead to biased understanding of Indian traditional medicine (e.g., Ayurveda) and religious metaphors, making it difficult to judge the misleading nature of juxtaposed traditional and scientific terms; 2. Multilingual mixing (code-switching) increases analysis difficulty; 3. Models underestimate the correlation between emotional intensity and factual accuracy: promotional content uses strong words like 'miraculous', while debunking content is restrained in expression, but cultural expressions themselves carry emotions that easily lead to misjudgment.

## Methodological Innovations and Ethical Considerations: Balancing Transparency and Responsibility

Methodological innovations: Publicly releasing evaluation scripts (WER calculator, F1 evaluator, Kappa analyzer) and prompt templates for GPT-4o, Gemini 2.5 Pro, and DeepSeek to enhance replicability. Ethical considerations: Excluding personal information of viewers/commenters, limiting the dataset to non-commercial academic use, requiring email applications for controlled access, balancing research value and privacy protection.

## Practical Implications: Providing New Directions for Content Moderation and Fact-Checking

For platforms: It is recommended to introduce a multi-dimensional analysis framework that not only detects factual accuracy but also analyzes the rhetorical strategies of mixed traditional and scientific discourse, enhancing multilingual and cross-cultural understanding capabilities. For fact-checking organizations: LLMs can assist in identifying suspicious rhetorical patterns, but the final judgment requires the cultural sensitivity of human experts.

## Limitations, Future Directions, and Conclusion: Balancing Technical Neutrality and Humanistic Care

Limitations: Small sample size (N=30), single topic, post-hoc analysis not tracking propagation dynamics. Future directions: Expand topic coverage, introduce user behavior data, explore multimodal analysis (combining video visuals/audio intonation). Conclusion: Technical solutions need to be combined with humanistic care; avoid cultural stigmatization when combating misinformation, and build an effective and responsible information ecosystem.