Zing Forum

Reading

SOTOPIA-TOM: Theory of Mind and Information Management Evaluation in Multi-Agent Interactions

SOTOPIA-TOM is a multi-dimensional benchmark framework that evaluates the ability of LLM agents to manage information in multi-party interactions with information asymmetry and privacy sensitivity, revealing the persistent limitations of current models in complex coordination scenarios.

多智能体系统心智理论信息管理隐私保护信息不对称基准测试
Published 2026-05-04 15:59Recent activity 2026-05-05 10:43Estimated read 7 min
SOTOPIA-TOM: Theory of Mind and Information Management Evaluation in Multi-Agent Interactions
1

Section 01

SOTOPIA-TOM Benchmark Framework: Information Management and Theory of Mind Evaluation in Multi-Agent Interactions

SOTOPIA-TOM is a multi-dimensional benchmark framework designed to evaluate the ability of LLM agents to manage information in multi-party interaction scenarios with information asymmetry and privacy sensitivity. This framework reveals the persistent limitations of current models in complex coordination scenarios, and Theory of Mind (ToM) interventions have been proven to significantly improve agents' information management performance.

2

Section 02

Information Management Challenges in Multi-Agent Interactions

As LLM agents increasingly participate in multi-party interaction scenarios, properly handling information asymmetry (knowing when and to whom to disclose information) has become a key requirement. However, existing benchmarks cannot measure this ability in real multi-party scenarios, which restricts the development of multi-agent systems.

3

Section 03

Core Design of the SOTOPIA-TOM Framework

Core Objectives

SOTOPIA-TOM focuses on evaluating agents' navigation capabilities in environments with information asymmetry, privacy-sensitive interactions, and multi-party coordination scenarios (3-5 agents).

Interaction Environment

Supports two communication modes:

  • Public communication (broadcast): Share public information and coordinate actions
  • Private communication (direct message): Simulate private negotiations

Scenario Design

Includes 160 manually reviewed scenarios covering 8 industry domains, where each agent has unique information fragments and information dissemination depends on different channels.

4

Section 04

Multi-Dimensional Evaluation System and INFOMGMT Metric

Evaluation Dimensions

  1. Information Sharing Ability: Timely sharing of useful information with appropriate parties
  2. Information Acquisition Ability: Proactively seeking missing information
  3. Coordination Efficiency: Metrics like task completion time and communication rounds
  4. Privacy Protection: Preventing improper disclosure of sensitive information

Composite Metric

The research team integrated the four dimensions into a composite INFOMGMT metric to provide a one-stop evaluation.

5

Section 05

Experimental Results: Model Performance and ToM Intervention Effects

Model Coverage

The experiment includes 6 LLM backbone models and three prompting strategies: baseline (standard prompt), privacy enhancement (CoT-privacy), and Theory of Mind intervention (ToM-based).

Key Findings

  • Top models like GPT-5 achieved only a 62% INFOMGMT score, with limitations such as information-seeking deficiencies and insufficient privacy decisions.
  • ToM intervention had significant effects: GPT-4o's privacy violation rate dropped from 9.9% to 2.2% (↓77.8%), and its INFOMGMT score increased from 15% to 40% (↑166.7%).
6

Section 06

Comparative Analysis of Prompting Strategies

Differences in effects of different strategies:

  • Standard prompt: Lacks explicit consideration of privacy and coordination
  • CoT-privacy: Improves privacy protection but may impair coordination efficiency
  • ToM intervention: Achieves the best balance between coordination and privacy
7

Section 07

Technical Insights and Application Value

Exposed Limitations

Current LLM agents have persistent deficiencies in complex information-asymmetric coordination, privacy-aware decision-making, and Theory of Mind capabilities.

Platform Value

SOTOPIA-TOM is a scalable testing platform that supports the development of privacy-aware multi-agent systems and research on ToM applications, among others.

Practical Applications

The results can be applied to scenarios such as intelligent customer service, negotiation decision support, privacy-protecting AI, and social robots.

8

Section 08

Future Directions and Conclusions

Future Research

  1. Enhance Theory of Mind capabilities
  2. Optimize privacy-utility trade-offs
  3. Multi-agent learning of information management strategies
  4. Cross-domain generalization research

Conclusions

Through real scenarios and comprehensive evaluations, SOTOPIA-TOM systematically reveals the information management limitations of LLM agents. The significant effects of ToM interventions indicate that explicit Theory of Mind modeling is a key direction for improving multi-agent system capabilities, providing important standards and directions for related research.