Reading

MASPO: A New Framework for Joint Prompt Optimization in Multi-Agent Systems

The performance of multi-agent systems heavily depends on the quality of role prompts, but joint optimization across agents faces the challenge of misalignment between local and global objectives. MASPO achieves an average improvement of 2.9 percentage points across 6 tasks through a joint evaluation mechanism and data-driven evolutionary beam search, and has been accepted by ICML 2026.

多智能体系统提示词优化大语言模型MAS进化算法联合优化ICML 2026

Published 2026-05-08 01:35Recent activity 2026-05-10 00:53Estimated read 9 min

MASPO: A New Framework for Joint Prompt Optimization in Multi-Agent Systems

Section 01

Introduction to the MASPO Framework: A New Breakthrough in Joint Prompt Optimization for Multi-Agent Systems

Title: MASPO: A New Framework for Joint Prompt Optimization in Multi-Agent Systems Abstract: The performance of multi-agent systems depends on the quality of role prompts, but joint optimization across agents faces the challenge of misalignment between local and global objectives. MASPO achieves an average improvement of 2.9 percentage points across 6 tasks through a joint evaluation mechanism and data-driven evolutionary beam search, and has been accepted by ICML 2026. This thread will introduce the core content of the framework, including background, methods, and experimental results, in separate floors.

Section 02

Challenges in Multi-Agent Systems: The Dilemma of Prompt Optimization

The Rise and Challenges of Multi-Agent Systems

Large language model-based multi-agent systems (MAS) have become powerful tools for solving complex collaborative tasks, applied in fields such as software development and scientific research. Prompts are the "soul" of MAS, defining agents' identities, capabilities, and interaction methods, directly affecting professionalism, collaboration fluency, and overall system performance. However, joint prompt optimization across agents faces three major dilemmas:

Local-global misalignment: Optimizing a single agent's prompt may harm overall performance (e.g., dominant agents suppress others);
High-dimensional search space: The combination space of prompts expands exponentially with the number of agents, making manual tuning impractical;
Evaluation difficulty: Open tasks lack clear ground-truth, making it hard to determine optimization directions.

Section 03

Core of the MASPO Framework: Joint Evaluation and Evolutionary Beam Search

Core Innovations of the MASPO Framework

To address the above challenges, the MASPO (Multi-Agent System Prompt Optimization) framework proposes two core innovations:

Joint Evaluation Mechanism

Unlike traditional methods that only evaluate the local performance of individual agents, MASPO uses "whether the prompt can promote the success of downstream agents" as the standard, bridging the gap between local interactions and global results. It does not require ground-truth and is suitable for open tasks.

Data-Driven Evolutionary Beam Search

To handle the high-dimensional space, MASPO adopts an evolutionary beam search strategy:

Population initialization: Generate candidate populations through mutation starting from current prompts;
Joint evaluation and selection: Retain the top k candidates with the highest scores (beam width);
Iterative evolution: Repeat mutation, evaluation, and selection to gradually improve quality;
Cross-agent collaboration: Fix the best versions of other agents when optimizing a single agent to ensure fairness.

Section 04

Experimental Validation: Excellent Performance Across Six Tasks

Experimental Validation Results

The research team verified the effectiveness of MASPO on 6 diverse tasks:

Task Types

Covers collaborative reasoning, role-play dialogue, code generation and review, creative writing collaboration, information retrieval and synthesis, and decision support systems.

Main Results

Average accuracy improvement of 2.9 percentage points (outperforming state-of-the-art methods);
Outperforms baselines in all tasks with no performance degradation;
Fast convergence speed of evolutionary beam search.

Baseline Comparison

Single-agent methods (e.g., OPRO, PromptBreeder): Ignore inter-agent impacts and perform poorly;
Manual tuning: Cannot achieve the effect of automatic optimization;
Naive joint optimization: Easily falls into local optima and performs worse than MASPO.

Section 05

Key Findings: Essential Insights into Prompt Optimization

Key Findings and Insights

Effective downstream success metric: Focusing on the prompt's help to subsequent agents is more aligned with the essential needs of MAS;
Advantage of evolutionary search: Naturally suitable for discrete text spaces and less likely to fall into local optima;
Prompt dependency: Adjusting an agent's prompt has chain reactions, highlighting the necessity of joint optimization.

Section 06

Limitations and Future Directions: Improvement Paths for MASPO

Limitations and Future Directions

Limitations

High computational overhead: Evolutionary search requires multiple executions of MAS;

Future Directions

Efficient evaluation strategies: Use proxy models to predict prompt quality and reduce actual executions;
Dynamic environment adaptation: Explore online/continuous optimization versions;
Interpretability enhancement: Improve the ability to explain optimization results;
Cross-task transfer: Study cross-task reuse of optimization strategies.

Section 07

Practical Value and Academic Recognition: Applications of MASPO and ICML Acceptance

Practical Application Value and Academic Recognition

Application Value

Lower development threshold: Reduce reliance on prompt engineering experts;
Improve system performance: Discover prompt combinations that are hard for humans to think of;
Accelerate iteration: Shorten tuning cycles and support rapid prototyping and A/B testing;
Standardized evaluation: Provide a joint evaluation framework for fair comparison of solutions.

Academic Recognition

MASPO has been accepted by ICML 2026, and the paper code is open-sourced to facilitate community reproduction and extension.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15