Zing Forum

Reading

EmoBench-M: A New Benchmark for Evaluating Emotional Quotient of Multimodal Large Models

Introducing the EmoBench-M benchmark, the first comprehensive evaluation framework specifically designed to assess the emotional quotient (EQ) capabilities of multimodal large language models (MLLMs), covering dimensions such as emotion recognition, empathetic understanding, and emotional reasoning.

多模态大模型情商评测情绪识别共情理解情感推理AI评测基准
Published 2026-04-01 23:45Recent activity 2026-04-01 23:54Estimated read 7 min
EmoBench-M: A New Benchmark for Evaluating Emotional Quotient of Multimodal Large Models
1

Section 01

EmoBench-M: Guide to the New Benchmark for Evaluating EQ of Multimodal Large Models

EmoBench-M is the first comprehensive evaluation framework dedicated to assessing the emotional quotient (EQ) capabilities of multimodal large language models (MLLMs). It fills the gap in traditional evaluations that focus solely on cognitive abilities while neglecting the dimension of emotional understanding. This benchmark covers three progressive EQ capabilities: emotion recognition, empathetic understanding, and emotional reasoning. Through systematic dataset construction and multi-dimensional evaluation methods, it supports developers in identifying weak points in models' emotional understanding, which is of great significance for scenarios like AI assistants and mental health.

2

Section 02

Necessity and Background of EQ Evaluation

Traditional large model evaluations focus on cognitive abilities (knowledge reserve, logical reasoning, etc.), but in real-world applications, AI needs to engage in emotional interactions with humans (e.g., medical assistants understanding patients' anxiety, educational systems perceiving students' frustration). The lack of EQ evaluation leads to models with excellent technical indicators but being cold and unresponsive in real scenarios, leaving developers unable to locate problems. EmoBench-M addresses this pain point by providing standardized evaluation methods and a layered EQ model to help identify weak links.

3

Section 03

Three-Layer EQ Evaluation Architecture of EmoBench-M

EmoBench-M divides EQ capabilities into three progressive levels:

  1. Emotion Recognition: The basic layer, which identifies emotions from multimodal inputs (facial expressions, voice, text). The challenge lies in cross-modal information integration.
  2. Empathetic Understanding: Based on recognition, it understands the causes, intensity, evolution, and cultural differences of emotions, requiring social common sense and causal reasoning.
  3. Emotional Reasoning: The highest layer, involving the selection of emotional support strategies, moral emotional reasoning, social scenario simulation, etc. It is close to real applications and is the weakest link of current models.
4

Section 04

Dataset Construction and Evaluation Methods

Dataset construction follows strict standards: integrating public emotional datasets such as AFEW and RAVDESS; manually annotating samples for empathetic understanding/reasoning tasks; generating adversarial boundary cases; balancing cross-cultural data. The evaluation uses a multi-dimensional scoring mechanism, focusing on answer correctness and reasoning processes. Open-ended tasks combine manual evaluation and GPT-4-assisted scoring to ensure reliability.

5

Section 05

Analysis of EQ Performance of Current Multimodal Large Models

Preliminary evaluations show obvious hierarchical differences in models' EQ performance:

  • Emotion Recognition: Mainstream models have high accuracy (especially in facial expression recognition), benefiting from pre-trained image-text alignment data.
  • Empathetic Understanding: Performance varies; models can understand obvious causal relationships but struggle with scenarios involving implicit social common sense.
  • Emotional Reasoning: Almost all models face challenges—their responses seem reasonable but are mechanical or inappropriate in scenarios requiring deep emotional intelligence. This finding indicates that high-emotional-intelligence scenarios need human supervision or hybrid architecture support.
6

Section 06

Application Scenarios and Industrial Value of EmoBench-M

EmoBench-M impacts multiple fields:

  • Mental Health: Evaluate the emotional understanding ability of AI psychological counseling assistants.
  • Education: Help educational AI perceive students' emotions and adjust teaching strategies.
  • Customer Service: Optimize emotional interactions of intelligent customer service to improve satisfaction.
  • Content Moderation: Accurately identify harmful content or users in need of support.
  • Entertainment: Endow virtual characters with real emotional responses to enhance immersion.
7

Section 07

Limitations and Future Development Directions

EmoBench-M has limitations: insufficient cultural universality (mainly based on Western culture); static samples cannot cover dynamic continuous emotional interactions; ethical boundaries need in-depth discussion (e.g., defining "good" emotional responses). Future directions: expand cross-cultural data, introduce interactive evaluation, establish a causal explanation mechanism for EQ, and explore the relationship between EQ and other cognitive abilities.