Zing Forum

Reading

AgentSlimming: The 'Slimming' Approach for Multi-Agent Systems, Reducing Token Costs by 78.9%

The AgentSlimming framework evaluates agent importance via a hybrid mechanism, removes redundant agents or replaces them with low-cost alternatives, reducing the token cost of multi-agent systems by 78.9% while maintaining performance.

多智能体系统模型压缩成本优化token效率智能体剪枝MAS
Published 2026-05-09 17:03Recent activity 2026-05-12 13:26Estimated read 6 min
AgentSlimming: The 'Slimming' Approach for Multi-Agent Systems, Reducing Token Costs by 78.9%
1

Section 01

Introduction: AgentSlimming—An Efficient Slimming Solution for Multi-Agent Systems

Large language model (LLM)-based multi-agent systems (MAS) perform well in complex tasks, but the expansion of agent numbers leads to excessive token consumption. The AgentSlimming framework evaluates agent importance via a hybrid mechanism, removes redundant agents or replaces them with low-cost alternatives, reducing token costs by 78.9% while maintaining performance, providing a practical solution for efficiency optimization of multi-agent systems.

2

Section 02

Background: Why Do Multi-Agent Systems 'Gain Weight'?

The root causes of multi-agent systems 'gaining weight' include:

  1. Manual design limitations: Relying on experience, it's easy to add redundant 'insurance' agents;
  2. Side effects of automated expansion: Lack of pruning mechanisms, making it difficult to remove agents after they are added;
  3. Redundancy cascade effect: Unnecessary agents not only consume resources themselves but also amplify interaction overhead.
3

Section 03

Methodology: AgentSlimming's Three-Layer Compression Mechanism

AgentSlimming draws on the pruning and quantization ideas from neural network compression, with a core three-layer compression mechanism:

  1. Hybrid importance assessment: Evaluate agent value from multiple dimensions—structure (position in communication graph), function (task contribution), and interaction (criticality of information flow);
  2. Dual-mode compression: Remove low-importance agents or replace high-cost agents with low-cost alternatives;
  3. Baseline-anchored acceptance rule: Verify performance after compression; if the drop exceeds the threshold, roll back to ensure safe slimming.
4

Section 04

Evidence: Experimental Results of 78.9% Cost Reduction

Experimental results show:

  • Token cost reduction: 78.9% on average, exceeding 90% in the best case;
  • Performance maintenance: Negligible performance drop, with some tasks showing improved performance;
  • Reasons for performance improvement: Removing redundancy reduces information noise, simplifies coordination decisions, and focuses resources on core agents.
5

Section 05

Application Value: Benefits for Developers, Enterprises, and Researchers

The application value of AgentSlimming includes:

  • Developers: Reduce experiment costs, simplify system design, and ensure performance;
  • Enterprise users: Cut API fees, improve response speed, and ease maintenance;
  • Researchers: Understand agent contributions, guide system design, and enable open-source collaboration.
6

Section 06

Limitations and Future Directions: From Static to Dynamic Exploration

Current limitations:

  1. Static compression: Targets static workflows; dynamic system compression remains to be solved;
  2. Task dependency: Effect varies across tasks;
  3. Alternative limitations: Relies on low-cost agent alternatives. Future directions: Dynamic compression, adaptive thresholds, cross-task transfer, and multi-objective optimization.
7

Section 07

Open Source and Community: Promoting the Ecosystem of Multi-Agent Systems

The AgentSlimming code has been open-sourced on GitHub, with the following significance:

  • Reproducibility: Facilitates verification and extended experiments;
  • Community contributions: Supports the development of new compression strategies;
  • Ecosystem building: Promotes standardization of multi-agent system tools.
8

Section 08

Conclusion: The Value of 'Subtraction' in AI System Design

AgentSlimming achieves efficient slimming of multi-agent systems, with a core insight: 'Subtraction' in AI system design is harder but more valuable than 'addition'. It provides a feasible path for multi-agent systems to transition from bloated to streamlined, and from expensive to efficient—representing an elevation of technological progress and design philosophy.