Zing Forum

Reading

Big Five Personality Model Meets Large Language Models: Does AI Have a "Personality"?

A research team from the University of Alicante in Spain conducted a systematic evaluation of large language models (LLMs) using the classic psychological Big Five Personality Inventory, exploring whether AI has measurable "personality traits" and the psychometric validity of such measurements.

大五人格心理测量学大语言模型人格评估AI安全阿利坎特大学内容效度因子分析
Published 2026-05-20 17:14Recent activity 2026-05-20 17:18Estimated read 8 min
Big Five Personality Model Meets Large Language Models: Does AI Have a "Personality"?
1

Section 01

[Introduction] Evaluating LLMs with the Big Five Personality Model: Does AI Have Measurable "Personality"?

A research team from the University of Alicante in Spain conducted a systematic evaluation of large language models (LLMs) using the classic psychological Big Five Personality Inventory, exploring whether AI has measurable "personality traits" and the psychometric validity of such measurements. The study covers core dimensions including content validity verification, norm data establishment, and factor structure analysis, with results providing important insights for model selection, AI safety assessment, anthropomorphic understanding, and model alignment.

2

Section 02

Background: Collision Between a Psychological Classic and AI

The Big Five Personality Model is one of the most authoritative personality assessment frameworks in psychology, consisting of five dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Validated over decades, it is widely used in talent selection, clinical diagnosis, and other fields. With the rapid development of LLMs, the question "Does AI exhibit human-like personality traits?" has become an interesting one—how should we interpret the results if we ask AI to fill out a personality questionnaire?

3

Section 03

Project Overview: Testing LLM Personality Using Psychometric Methods

The team from the University of Alicante launched a pioneering evaluation effort aimed at testing the performance of LLMs in Big Five personality tests using rigorous psychometric methods. The project is not limited to philosophical discussions but focuses on specific questions: Does the Big Five Personality Inventory have content validity for AI? Is there a pattern in the "personality" distribution among different models? Does the measurement result conform to the factor structure of psychological theory?

4

Section 04

Core Research Dimensions: Content Validity, Norms, and Factor Analysis

Content Validity Verification

Are the classic Big Five questionnaire items suitable for evaluating AI? Humans answer based on self-awareness and experience, while AI generates text based on patterns in training data—does this difference affect measurement validity? The team evaluated the applicability of the scale through systematic testing.

Norm Data Establishment

Collecting test data from mainstream LLMs, the team attempted to establish AI "personality norms" to compare differences between models and observe the correlation between model size, architecture, training data, etc., and "personality traits".

Factor Structure Analysis

Testing whether the AI's response patterns exhibit the five-factor structure of the Big Five model or a different dimension division—this is of great significance for understanding the nature of AI "personality".

5

Section 05

Technical Methods and Data: Standardized Scales and Multi-Model Testing

The project uses a standardized Big Five Personality Inventory as the measurement tool, conducting batch tests on mainstream LLMs such as the GPT series, Llama, and Claude. Through carefully designed prompt engineering, the team ensures that models complete the evaluation under consistent conditions; data collection includes multiple independent runs of the models to control random variation, while recording metadata such as model version and parameter size to provide a basis for subsequent correlation analysis.

6

Section 06

Research Significance: Multi-Faceted Insights for Model Selection, Safety, etc.

The research significance goes beyond academic curiosity:

  • Model Selection: Different scenarios require AI with different "personalities" (e.g., customer service prefers high Agreeableness, creative writing requires high Openness);
  • Safety Assessment: Certain trait combinations may be related to harmful output tendencies, and personality assessment can supplement AI safety testing;
  • Anthropomorphic Understanding: Scientific measurement helps distinguish between real trait patterns and human projections;
  • Model Alignment: Understanding trait tendencies helps design more precise alignment strategies to make AI behavior more in line with human expectations.
7

Section 07

Limitations and Future Directions: Correct Interpretation of AI Personality Scores

Note: AI's "personality scores" should not be over-interpreted as psychological traits in the human sense—LLMs have no personality or consciousness, and their answers reflect patterns of personality descriptions in training data and simulations of human language habits.

Future research directions: Exploring the correlation between personality assessment results and model performance on specific tasks; the possibility of "shaping" model personality through fine-tuning or prompt engineering; cross-cultural and cross-linguistic differences in model personality.

8

Section 08

Conclusion: The Importance of Rigorously Examining AI Behavioral Characteristics

The research from the University of Alicante represents a rigorous scientific approach to examining the behavioral characteristics of AI systems, reminding us that while we marvel at the capabilities of LLMs, we need to develop corresponding assessment tools and theoretical frameworks to understand these systems. Whether AI truly has a "personality" or not, measuring the behavioral patterns it exhibits is an important step toward building safer and more controllable AI systems. The project's code and data have been open-sourced, providing a foundation for subsequent research.