Zing Forum

Reading

Discourse Analysis of Generative AI on Reddit: A Computational Social Science Practice of Social Media Mining

A computational social science project based on Reddit data, using sentiment analysis and BERTopic topic modeling techniques to analyze the discussion patterns, emotional tendencies, and discourse evolution of generative AI across different communities, revealing the public perception landscape of technology socialization.

社交媒体挖掘情感分析BERTopic主题建模生成式AIReddit计算社会科学公众话语技术社会学自然语言处理
Published 2026-05-18 11:38Recent activity 2026-05-18 11:53Estimated read 7 min
Discourse Analysis of Generative AI on Reddit: A Computational Social Science Practice of Social Media Mining
1

Section 01

Introduction: Core Research Framework of Generative AI Discourse Analysis on Reddit

This study is a computational social science practice based on Reddit data. Using sentiment analysis and BERTopic topic modeling techniques, it systematically analyzes the discussion patterns, emotional tendencies, and discourse evolution of generative AI across different communities, aiming to reveal the public perception landscape in the process of technology socialization. The research focuses on the discourse differences among different types of communities (creative, technical, comprehensive), explores the temporal changes in generative AI discussions, and provides references for AI developers, policymakers, and researchers.

2

Section 02

Research Background and Problem Awareness

Generative AI has moved from laboratories to daily tools for the public, but traditional technology evaluation mostly focuses on model performance and ignores feedback from real user scenarios. Social media platforms (such as Reddit) are important windows to observe "technology socialization", and users' spontaneous discussions form a valuable data source. This study attempts to answer: How do different communities discuss generative AI? What are the differences in emotional tones? What are the dominant topics and their evolution trends? What are the discourse framework differences among different types of communities?

3

Section 03

Data Sources and Research Method Design

Data Collection: Obtain post data (ID, subreddit, title, body, etc.) and comment data (recursively flatten nested structures) from Reddit's public JSON endpoints, with request intervals set in compliance with platform rules. Community Selection: Stratified sampling of 9 subreddits, covering creative (e.g., r/Midjourney), technical (e.g., r/OpenAI), and comprehensive (e.g., r/technology) categories, to compare the discourse characteristics of users from different backgrounds. Preprocessing: Clean raw text (unify case, remove URLs/stop words, lemmatize, etc.) to retain semantic information and standardize input formats.

4

Section 04

Analysis Techniques: Sentiment Analysis and BERTopic Modeling

Sentiment Analysis: Adopt a hybrid method of lexicon + machine learning to distinguish the emotional patterns of posts (topic initiation) and comments (discussion participation), and identify the impact of events on public sentiment in combination with the temporal dimension. BERTopic Topic Modeling: Use pre-trained language models to encode document vectors, then perform UMAP dimensionality reduction, HDBSCAN clustering, and c-TF-IDF to extract topic labels, generating interpretable discussion topics (e.g., technical tutorials, ethical concerns, tool evaluations, etc.).

5

Section 05

Key Findings: Cross-Community Differences and Discourse Evolution

Cross-Community Comparison: Creative communities focus on work display and usage skills (AI as a creative tool); technical communities focus on model principles and performance optimization (AI as a technical system); comprehensive communities discuss social impacts and future trends (AI as a social force). Temporal Dimension: Discussion popularity fluctuates with major technical events (e.g., ChatGPT release, GPT-4 launch); topic focus shifts from basic functions to advanced skills, critical reflection, and ecosystem building, reflecting changes in technology maturity.

6

Section 06

Practical Significance and Research Limitations

Practical Value: Provide AI developers with a macro view of user feedback to guide product design; provide policymakers with real-time public perception references; show researchers the methodological path of computational social science. Limitations: Reddit user groups have demographic biases; platform characteristics (anonymity, voting mechanism) affect discourse expression; sentiment analysis accuracy is challenged by colloquial/satirical texts.

7

Section 07

Conclusion and Outlook

This study reveals the public discourse characteristics of generative AI through social media mining, providing references for multiple stakeholders. In the future, this analysis framework can be used to continuously track discourse evolution, identify emerging topics, and deepen the understanding of the interaction between technology and society.