Reading

The Impact of Politeness on LLMs: A Cross-Lingual, Multi-Model Study Using the PLUM Corpus

This study uses the PLUM corpus to investigate the impact of polite language on the response quality of large language models (LLMs). The experiments cover 3 languages, 5 models, and 22,500 prompt-response pairs. It finds that polite prompts can improve response quality by approximately 11%, but the effect varies across languages and models and is not universal.

politenessLLM behaviorcross-linguisticmultilingualPLUM corpusprompt engineeringcultural differenceshuman-AI interaction

Published 2026-04-18 01:33Recent activity 2026-04-20 10:57Estimated read 8 min

The Impact of Politeness on LLMs: A Cross-Lingual, Multi-Model Study Using the PLUM Corpus

Section 01

[Introduction] The Impact of Politeness on LLM Response Quality: Core Overview of Cross-Lingual Multi-Model Research

The title of this paper is The Impact of Politeness on LLMs: A Cross-Lingual, Multi-Model Study Using the PLUM Corpus, which focuses on whether polite language affects the response quality of large language models (LLMs). The study covers 3 languages (English, Hindi, Spanish), 5 models (Gemini-Pro, GPT-4o Mini, Claude 3.7 Sonnet, DeepSeek-Chat, Llama 3), and 22,500 prompt-response pairs. Key findings: Polite prompts improve response quality by an average of about 11%, but the effect varies across languages (e.g., Hindi prefers respectful and indirect expressions, while Spanish favors firm and confident ones) and models (e.g., Llama 3 is most sensitive to tone, GPT-4o Mini is more robust). Additionally, the tone of dialogue history affects the quality of current responses. This study aims to reveal the rules of interaction between politeness and LLMs, providing references for users' communication strategies and developers' model optimization.

Section 02

Research Background and Theoretical Foundations

The study is based on two classic sociolinguistic theories:

Brown and Levinson's Politeness Theory: Treats politeness as "facework", including positive face (need for recognition) and negative face (need for freedom of action). Polite language is used to maintain both parties' faces.
Culpeper's Impoliteness Framework: Studies behaviors that intentionally attack or ignore others' faces (e.g., imperative tone, sarcasm, etc.). These theories provide an analytical framework for classifying the politeness level of prompts and observing differences in LLM responses. Moreover, as LLMs integrate into daily life, understanding the impact of politeness on their performance has practical significance, such as guiding users in effective communication and adjusting cross-cultural usage strategies.

Section 03

Research Methods: PLUM Corpus and Evaluation Framework

PLUM Corpus:

Sample size: 22,500 prompt-response pairs
Coverage: 3 languages (English, Hindi, Spanish), 5 models, 3 types of dialogue history (original, polite, impolite)
Politeness level annotation: 5 levels (very polite → very impolite), verified manually to ensure consistency. Evaluation Framework: 8 dimensions to comprehensively assess response quality, including coherence, clarity, depth, responsiveness, context retention, toxicity, conciseness, and readability.

Section 04

Core Findings: The Complexity of Politeness Effects

Politeness indeed affects quality: Polite prompts improve response quality by an average of about 11% compared to neutral prompts, while impolite prompts lead to a decline in quality.
Cross-language differences: English adapts to multiple tones; Hindi prefers respectful and indirect expressions; Spanish favors firm and confident expressions.
Cross-model differences: Llama 3 is most sensitive to tone (effect range of 11.5%); GPT-4o Mini has strong robustness to impolite inputs; other models have moderate sensitivity.
Impact of dialogue history: When the tone of previous dialogue is negative, even if the current prompt is polite, the response quality may still be affected.

Section 05

Analysis of the Causes of Differences

The differences stem from three aspects:

Training data characteristics: Context distribution of training data for different languages (e.g., Hindi has more formal and respectful contexts), and the scale and diversity of model training data (e.g., GPT's diverse data makes it robust).
Alignment training strategies: Manufacturers have different response designs for polite inputs (e.g., some models are explicitly trained to respond positively to polite prompts).
Cultural bias reflection: Models may reflect cultural values in training data, such as insufficient sensitivity to other cultural norms due to English-centric training.

Section 06

Practical Implications: Recommendations for Users and Developers

User Recommendations:

Maintain basic politeness (average 11% quality improvement);
Consider language and culture (e.g., use respectful and indirect expressions for Hindi, direct and firm expressions for Spanish);
Adapt to model characteristics (Llama 3 requires more politeness, GPT can be more direct);
Maintain dialogue tone (avoid accumulation of negative history). Developer Recommendations:
Cultural adaptation design (adjust sensitivity according to language/region);
Robustness training (improve tolerance to impolite inputs);
Transparent communication (inform users that tone affects responses).

Section 07

Limitations and Future Directions

Limitations:

Limited language coverage (only 3 languages, missing Chinese, Arabic, etc.);
Task type restrictions (only general dialogue, not involving professional tasks like programming);
Static evaluation (single interaction, no study of long-term dynamics). Future Directions:
Expand language coverage;
Task-specific research (e.g., creative writing, code generation);
Dynamic interaction research (evolution of politeness effects in multi-turn dialogues);
Develop intervention strategies (enable models to maintain high-quality responses under various tones).

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49