The core insight of ACRoCo is: instead of letting LLMs directly generate action instructions, it is better to let LLMs choose within a predefined legal action space. This design brings several significant advantages:
First, safety is guaranteed. Through legality masks, the system can automatically exclude dangerous or unexecutable actions before strategy selection, avoiding robots from executing instructions that may cause damage or danger.
Second, interpretability is greatly improved. Since the action space is clearly defined, each decision can be traced back to a specific set of legal actions, facilitating debugging and optimization.
Finally, training efficiency is significantly enhanced. Constraining the action space reduces the strategy search space, allowing reinforcement learning algorithms to converge to optimal strategies faster.