Zing Forum

Reading

KWALLM: A Large Language Model-Based Qualitative Text Analysis Tool for Social Science Research

KWALLM is a qualitative text analysis application developed using R and Shiny, enabling non-technical users to perform analysis tasks such as text classification, topic extraction, and sentiment scoring using large language models.

质性研究文本分析大语言模型R语言Shiny社会科学主题建模人机协同PII脱敏计算社会科学
Published 2026-06-07 04:45Recent activity 2026-06-07 04:49Estimated read 5 min
KWALLM: A Large Language Model-Based Qualitative Text Analysis Tool for Social Science Research
1

Section 01

Introduction / Main Post: KWALLM: A Large Language Model-Based Qualitative Text Analysis Tool for Social Science Research

KWALLM is a qualitative text analysis application developed using R and Shiny, enabling non-technical users to perform analysis tasks such as text classification, topic extraction, and sentiment scoring using large language models.

2

Section 02

Original Author and Source


3

Section 03

Project Overview

KWALLM is a text analysis application specifically designed for qualitative research, developed by Kennispunt Twente (Knowledge Center Twente, Netherlands). Built on the R language and Shiny framework, it encapsulates the powerful capabilities of large language models (LLMs) in a user-friendly web interface, allowing social science researchers to conduct efficient text analysis without a programming background.


4

Section 04

Classification Analysis

Users can predefine a list of categories, and the model will automatically classify texts. For example, product reviews can be categorized into "positive", "negative", or "neutral". This supervised classification method is suitable for research scenarios with a clear analysis framework already in place.

5

Section 05

Feature Scoring

Users define specific features (e.g., "level of positive emotion"), and the model scores texts based on their matching degree with the feature. This method provides more fine-grained quantitative indicators than simple classification, making it suitable for research questions that require measuring degrees.

6

Section 06

Topic Extraction

Without predefined categories, the model automatically identifies topics in texts and assigns labels. This method is based on the research findings of Wanrooij, Manhar & Yang (2024) and Pham et al. (2023), and outperforms traditional methods like BERTopic on small datasets.

7

Section 07

Text Tagging

For qualitative coding needs, the model can mark text segments related to specific codes. For example, given the code "color", the model will highlight all text segments mentioning colors (such as "yellow" in "The sun is yellow"). Users can customize codes or let the LLM automatically generate codes based on the text. This mode is particularly suitable for analyzing long texts like interview records or focus group discussions.


8

Section 08

Automatic PII Redaction

Considering research ethics and data protection regulations (e.g., GDPR), KWALLM has built-in multi-layer mechanisms for personal information identification and redaction:

  • Basic Detection: Uses regular expressions to identify common PII such as email addresses, phone numbers, and Dutch postal codes
  • Advanced Detection: Integrates the GLiNER model for localized deep PII identification without sending sensitive data to external APIs

This design ensures the privacy of research participants is protected while not compromising the quality of analysis.