Zing Forum

Reading

ACRoCo: A Multi-Robot Collaboration Framework Based on Action Constraints and LLM

ACRoCo is a multi-robot collaboration method that converts open-ended LLM planning into action-constrained decisions, enabling efficient collaboration through validity masking, MAPPO reinforcement learning, and hybrid strategies.

多机器人协作大型语言模型强化学习MAPPO具身智能动作约束机器人规划LLM幻觉
Published 2026-05-29 16:12Recent activity 2026-05-29 16:18Estimated read 7 min
ACRoCo: A Multi-Robot Collaboration Framework Based on Action Constraints and LLM
1

Section 01

ACRoCo Framework Overview: A Multi-Robot Collaboration Solution Combining LLM and Action Constraints

ACRoCo is a multi-robot collaboration method that converts open-ended LLM planning into action-constrained decisions, aiming to solve the "hallucination" problem in LLM-generated plans (such as unexecutable actions or instructions violating physical constraints). The framework achieves efficient collaboration through validity masking, MAPPO reinforcement learning, and hybrid strategies. Its core innovation lies in compressing the open planning space into a finite set of executable actions, combining LLM reasoning capabilities with physical constraints to provide a solution for reliable robot collaboration systems.

2

Section 02

Background and Challenges of LLM Applications in Multi-Robot Collaboration

In the field of multi-robot collaboration, LLMs exhibit strong planning and reasoning capabilities, but the generated plans often contain unexecutable or physically constrained instructions (the "hallucination" problem). Traditional methods separate LLM planning from low-level control, making it difficult to handle the semantic gap between planning and execution. When LLMs suggest invalid actions, the system easily fails or requires complex fallback mechanisms. The ACRoCo project was proposed precisely to address this core issue.

3

Section 03

Analysis of ACRoCo's Core Methods and Technical Architecture

The core idea of ACRoCo is to convert open-ended LLM planning into decisions under action constraints, pre-filtering invalid actions through validity masking. The technical architecture includes: 1. Factorized validity masking (decomposes action heads and dynamically computes valid combinations); 2. MAPPO and CTDE training framework (centralized training with decentralized execution to ensure decisions are within physically feasible ranges); 3. Primitive-aware architecture (macro actions are composed of reusable primitives like REACH and GRASP to align high-level strategies with low-level execution); 4. Hierarchical phase-adaptive reward (dynamically adjusts semantic and physical layer rewards to improve training stability).

4

Section 04

Task-Adaptive Mechanism and Mask-Aware LLM Prompt Design

ACRoCo introduces a task adaptive manager that automatically generates action spaces and masks through task definition components (objects, goals, reachability graphs, etc.), supporting tasks like Sort and Sweep with reusable training processes. In LLM interaction, mask-aware prompt technology is used to explicitly expose the current set of valid actions to the LLM, limiting its decisions to valid options and significantly reducing the probability of hallucinatory actions.

5

Section 05

Experimental and Evaluation Support for ACRoCo

The project provides complete training and evaluation scripts, including RL benchmark tests (benchmark_rl.py), mask-aware hybrid strategy ablation experiments (benchmark_mask_aware.py), real MuJoCo environment rollouts, and cross-task generalization tests. Training curves and ablation experiment visualization results are stored in the figures directory, facilitating analysis of algorithm performance and convergence behavior.

6

Section 06

Practical Significance of ACRoCo and Implications for Embodied Intelligence

The value of ACRoCo lies in demonstrating how to combine LLM reasoning capabilities with physical constraints. Its methodology can be extended to the field of embodied intelligence: 1. Constraints as interfaces (explicitly input to LLMs to reduce invalid planning); 2. Hierarchical hybrid architecture (combines LLM common sense reasoning with RL fine control); 3. Reusable training pipeline (task adaptive manager quickly adapts to new tasks). It provides a verification path and code base for deploying LLMs on real robot platforms.

7

Section 07

ACRoCo Project Quick Start Guide

Installation Methods:

  • Using conda: conda env create -f conda.yml conda activate acroco python -m pip install -e .
  • Using uv: uv venv .venv source .venv/bin/activate uv pip install -r requirements.txt uv pip install -e . Training Examples:
  • Sort task: python scripts/train/train_rl_sort.py --steps 30000 --save checkpoints/sort_mappo.pt
  • Sweep task: python scripts/train/train_rl_sweep.py --steps 30000 --save checkpoints/sweep_mappo.pt
8

Section 08

ACRoCo Framework Summary and Outlook

ACRoCo represents an important progress in the field of multi-robot collaboration, successfully combining LLM open-domain reasoning with physical constraints. Through validity masking, factorized action space, and hybrid strategy architecture, it provides a practical solution for building reliable and deployable robot collaboration systems. For researchers in the fields of embodied intelligence, multi-agent reinforcement learning, and LLM applications, it is an open-source project worth attention and reference.