Zing Forum

Reading

Interdisciplinary Discourse Mapping: Analyzing the Decade-Long Evolution of Educational Research Using LLM and BERTopic

This project uses large language models (LLMs) and topic modeling techniques to conduct interdisciplinary discourse analysis on journal abstracts in the fields of learning sciences and educational technology from 2015 to 2025, demonstrating the application of computational social science in academic trend insight.

计算社会科学主题建模BERTopic学术话语分析大语言模型教育研究跨学科分析文本挖掘Bootstrap推断知识图谱
Published 2026-04-29 13:40Recent activity 2026-04-29 14:06Estimated read 7 min
Interdisciplinary Discourse Mapping: Analyzing the Decade-Long Evolution of Educational Research Using LLM and BERTopic
1

Section 01

[Introduction] Interdisciplinary Discourse Mapping: Analyzing the Decade-Long Evolution of Educational Research Using LLM and BERTopic

This article introduces an innovative computational social science project that uses large language models (LLMs) and BERTopic topic modeling techniques to conduct interdisciplinary discourse analysis on journal abstracts in the fields of learning sciences and educational technology from 2015 to 2025. It constructs a discourse map, providing a data-driven perspective for understanding the evolutionary trends of educational research and demonstrating the application of computational social science in academic trend insight.

2

Section 02

Research Background: Need for Discourse Analysis in Learning Sciences and Educational Technology

Learning sciences originate from cognitive science and educational psychology, focusing on the deep mechanisms of learning; educational technology focuses on the design and application of technical tools. Over the past decade, both fields have developed rapidly, but there is a lack of quantitative research on knowledge flow, conceptual intersection, and discourse differentiation. Traditional manual coding content analysis is time-consuming and labor-intensive. Although natural language processing technology provides new possibilities, theme discovery combined with domain knowledge and algorithm validity verification remain methodological challenges.

3

Section 03

Project Design and Technical Route: Multi-stage Analysis Process

Core Objectives: Answer questions such as core themes in both fields, theme evolution, cross-domain sharing and differences, and improvement of interpretability of LLM-assisted theme annotation. Data Scope: Journal abstracts from major journals in both fields from 2015 to 2025, covering ten-year trends. Technical Process: 1. Text preprocessing (cleaning, word segmentation, stopword removal); 2. BERTopic topic modeling; 3. LLM-assisted theme annotation; 4. Cross-domain comparative analysis; 5. Statistical verification (Bootstrap inference, sensitivity analysis).

4

Section 04

Core Technology Analysis: Integrated Application of BERTopic and LLM

BERTopic Topic Modeling: Generates topics through Sentence-BERT document embedding, UMAP dimensionality reduction + HDBSCAN clustering, and c-TF-IDF representative vocabulary. Advantages over LDA: captures semantic relationships, breaks the bag-of-words assumption, automatically determines the number of topics, and discovers fine-grained subtopics. LLM-Assisted Annotation: Extracts representative documents and keywords from topic clusters, inputs them into LLM to generate labels, and manually reviews and revises to solve the problem of theme naming. Statistical Verification (Bootstrap Inference, Sensitivity Analysis): Uses Bootstrap resampling to estimate confidence intervals; verifies the robustness of results by changing parameters such as UMAP neighbor count and HDBSCAN minimum cluster size.

5

Section 05

Research Findings: Theme Evolution and Interdisciplinary Convergence in the Education Field

Domain-Specific Themes: Learning sciences focus on cognitive load, situated learning, metacognition, etc.; educational technology focuses on online learning platforms, mobile learning, VR, etc. Interdisciplinary Convergence: Shared themes such as design-based research, evidence-based practice, and learning analytics. Temporal Evolution: Traditional themes (e.g., behaviorism) decline, emerging themes (e.g., generative AI applications in education) rise, and periodic regression of themes like personalized learning occurs.

6

Section 06

Methodology: Contributions and Limitations

Contributions: 1. LLM enhances the topic modeling process, improving automation and interpretability; 2. The cross-domain comparison framework can be extended to other disciplines; 3. Bootstrap and sensitivity analysis improve result reliability. Limitations: Only analyzing journal abstracts misses details; if only English data is used, non-English contributions are ignored; theme trends only show correlation, not causal inference.

7

Section 07

Applications and Extensions: From Academic Monitoring to Interdisciplinary Cooperation

Application Scenarios: Academic trend monitoring to assist resource allocation; interdisciplinary cooperation discovery; curriculum content update; research topic selection assistance. Extension Directions: Multilingual analysis; combining citation network analysis for knowledge flow; full-text analysis; real-time monitoring for continuous tracking.