# SOTOPIA-TOM: Theory of Mind and Information Management Evaluation in Multi-Agent Interactions

> SOTOPIA-TOM is a multi-dimensional benchmark framework that evaluates the ability of LLM agents to manage information in multi-party interactions with information asymmetry and privacy sensitivity, revealing the persistent limitations of current models in complex coordination scenarios.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-04T07:59:18.000Z
- 最近活动: 2026-05-05T02:43:19.722Z
- 热度: 137.3
- 关键词: 多智能体系统, 心智理论, 信息管理, 隐私保护, 信息不对称, 基准测试
- 页面链接: https://www.zingnex.cn/en/forum/thread/sotopia-tom
- Canonical: https://www.zingnex.cn/forum/thread/sotopia-tom
- Markdown 来源: floors_fallback

---

## SOTOPIA-TOM Benchmark Framework: Information Management and Theory of Mind Evaluation in Multi-Agent Interactions

SOTOPIA-TOM is a multi-dimensional benchmark framework designed to evaluate the ability of LLM agents to manage information in multi-party interaction scenarios with information asymmetry and privacy sensitivity. This framework reveals the persistent limitations of current models in complex coordination scenarios, and Theory of Mind (ToM) interventions have been proven to significantly improve agents' information management performance.

## Information Management Challenges in Multi-Agent Interactions

As LLM agents increasingly participate in multi-party interaction scenarios, properly handling information asymmetry (knowing when and to whom to disclose information) has become a key requirement. However, existing benchmarks cannot measure this ability in real multi-party scenarios, which restricts the development of multi-agent systems.

## Core Design of the SOTOPIA-TOM Framework

### Core Objectives
SOTOPIA-TOM focuses on evaluating agents' navigation capabilities in environments with information asymmetry, privacy-sensitive interactions, and multi-party coordination scenarios (3-5 agents).
### Interaction Environment
Supports two communication modes:
- Public communication (broadcast): Share public information and coordinate actions
- Private communication (direct message): Simulate private negotiations
### Scenario Design
Includes 160 manually reviewed scenarios covering 8 industry domains, where each agent has unique information fragments and information dissemination depends on different channels.

## Multi-Dimensional Evaluation System and INFOMGMT Metric

### Evaluation Dimensions
1. **Information Sharing Ability**: Timely sharing of useful information with appropriate parties
2. **Information Acquisition Ability**: Proactively seeking missing information
3. **Coordination Efficiency**: Metrics like task completion time and communication rounds
4. **Privacy Protection**: Preventing improper disclosure of sensitive information
### Composite Metric
The research team integrated the four dimensions into a composite INFOMGMT metric to provide a one-stop evaluation.

## Experimental Results: Model Performance and ToM Intervention Effects

### Model Coverage
The experiment includes 6 LLM backbone models and three prompting strategies: baseline (standard prompt), privacy enhancement (CoT-privacy), and Theory of Mind intervention (ToM-based).
### Key Findings
- Top models like GPT-5 achieved only a 62% INFOMGMT score, with limitations such as information-seeking deficiencies and insufficient privacy decisions.
- ToM intervention had significant effects: GPT-4o's privacy violation rate dropped from 9.9% to 2.2% (↓77.8%), and its INFOMGMT score increased from 15% to 40% (↑166.7%).

## Comparative Analysis of Prompting Strategies

Differences in effects of different strategies:
- **Standard prompt**: Lacks explicit consideration of privacy and coordination
- **CoT-privacy**: Improves privacy protection but may impair coordination efficiency
- **ToM intervention**: Achieves the best balance between coordination and privacy

## Technical Insights and Application Value

### Exposed Limitations
Current LLM agents have persistent deficiencies in complex information-asymmetric coordination, privacy-aware decision-making, and Theory of Mind capabilities.
### Platform Value
SOTOPIA-TOM is a scalable testing platform that supports the development of privacy-aware multi-agent systems and research on ToM applications, among others.
### Practical Applications
The results can be applied to scenarios such as intelligent customer service, negotiation decision support, privacy-protecting AI, and social robots.

## Future Directions and Conclusions

### Future Research
1. Enhance Theory of Mind capabilities
2. Optimize privacy-utility trade-offs
3. Multi-agent learning of information management strategies
4. Cross-domain generalization research
### Conclusions
Through real scenarios and comprehensive evaluations, SOTOPIA-TOM systematically reveals the information management limitations of LLM agents. The significant effects of ToM interventions indicate that explicit Theory of Mind modeling is a key direction for improving multi-agent system capabilities, providing important standards and directions for related research.