Zing Forum

Reading

Systematic Review of Large Language Models in Survey Research: A Panoramic Analysis from Text Classification to Data Generation

A systematic review covering 136 studies comprehensively analyzes the current application status, successful scenarios, and failure cases of LLMs throughout the entire survey research process. It reveals that current research focuses on three major areas: text classification, data generation, and questionnaire design, while pointing out reproducibility issues such as the singularity of model selection and English context bias.

LLM调查研究系统性综述文本分类数据生成问卷设计公共舆论
Published 2026-06-15 17:45Recent activity 2026-06-15 17:49Estimated read 6 min
Systematic Review of Large Language Models in Survey Research: A Panoramic Analysis from Text Classification to Data Generation
1

Section 01

[Introduction] Key Points of the Systematic Review of Large Language Models in Survey Research

This article is a systematic review covering 136 studies, comprehensively analyzing the current application status, successful scenarios, and failure cases of LLMs throughout the entire survey research process. Key findings include: LLM applications are concentrated in three major areas—text classification, data generation, and questionnaire design; there are reproducibility issues such as single model selection and English context bias; LLMs are more suitable as advanced assistants for human researchers rather than replacements.

2

Section 02

Research Background and Methods: Why Is This Review Needed?

LLMs are reshaping the entire survey research process (questionnaire design, data collection, result analysis) at an unprecedented speed, but existing reviews lack systematicity, are too narrow, or remain at the theoretical level. This review fills the gap, led by scholars such as Leah von der Heyde, conducting both quantitative and qualitative evaluations of 136 empirical studies, sorting out the application status of LLMs in the pre-survey, during-survey, and post-survey stages, and analyzing their advantageous and disadvantageous scenarios.

3

Section 03

Key Findings: Three Major Application Areas of LLMs in Survey Research

As of 2025, LLM applications are concentrated in three areas:

  1. Text Data Classification: A mature scenario that can reduce unstructured text processing time from weeks to hours, but is unstable when capturing subtle semantics (e.g., sarcasm, culture-specific concepts);
  2. Survey Data Generation: Used for pre-research, testing questionnaire logic, etc., can generate data that conforms to the overall statistical distribution, but has limited ability to simulate individual heterogeneity;
  3. Survey Tool Development: Optimizes questionnaire wording, generates answer options, etc., but may produce biased expressions that require human review.
4

Section 04

Hidden Concerns in Research Design: Crisis of Generality and Reproducibility

Most studies have homogenization issues:

  • Single Model Selection: Over-reliance on the GPT series, and results may only reflect the characteristics of specific models;
  • Limited Prompting Methods: Zero-shot prompting is widely used, but few-shot/chain-of-thought prompting is more effective yet has a low adoption rate;
  • Anglocentrism: Insufficient research on applications in non-English contexts, making direct generalization of conclusions risky.
5

Section 05

Deep Insight: LLMs Are More Suitable for Assistance Than Replacement of Humans

LLMs excel at approximating broad aggregate patterns but struggle to capture subtle individual attitudes or complex constructs. Their optimal positioning is as 'human assistants':

  • Exploration phase: Quickly scan texts to identify topics;
  • Validation phase: Humans need to calibrate/verify LLM outputs;
  • Reporting phase: Assist in generating statistics and visualizations, but interpretive analysis requires human judgment. The 'human-in-the-loop' model balances the advantages of scale and human expertise.
6

Section 06

Future Directions: Promoting the Maturity of the Field

The review puts forward three suggestions:

  1. Establish Model Selection Guidelines: Accumulate evidence from model comparisons to avoid defaulting to popular models;
  2. Develop Standardized Reporting Specifications: Clarify details such as prompt design and model versions to improve reproducibility;
  3. Develop Survey-Specific Benchmark Datasets: Build evaluation datasets that reflect real-world scenarios (e.g., sensitive topics, cross-group fairness).
7

Section 07

Original Research Information and Open-Source Resources

  • Original Authors: Leah von der Heyde, Florian Keusch, Trent Buskirk, Adam Eck
  • Source: GitHub project 'survai'
  • Original Title: AI in the Loop!? A Systematic Review of the Use of Large Language Models in Survey and Public Opinion Research
  • Link: https://github.com/leahvdh/survai
  • Publication Date: 2026-06-15 This project provides R code, coded data, and a literature library to support subsequent verification and expansion.