# SWAP: Reconstructing Deliberative Reasoning of Language Models into a Structure-Aware Planning Framework

> The ACL 2025 main conference paper SWAP proposes a new reasoning paradigm for language models, which achieves more deliberative multi-step reasoning capabilities through the combination of structure-aware planning and precise world models.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-11T19:14:03.000Z
- 最近活动: 2026-04-11T19:18:58.563Z
- 热度: 159.9
- 关键词: SWAP, ACL 2025, deliberate reasoning, structure-aware planning, world model, language models, multi-step reasoning, github
- 页面链接: https://www.zingnex.cn/en/forum/thread/swap
- Canonical: https://www.zingnex.cn/forum/thread/swap
- Markdown 来源: floors_fallback

---

## Introduction: SWAP Framework - A New Reasoning Paradigm Combining Structure-Aware Planning and World Models

The ACL 2025 main conference paper SWAP proposes a new reasoning paradigm for language models, reconstructing the reasoning process into a structure-aware planning problem and achieving more deliberative multi-step reasoning capabilities by combining precise world models. This framework aims to address the core challenge of traditional Chain-of-Thought methods, which lack explicit control and structured planning in complex reasoning.

## Research Background and Motivation

Current large language models face the core challenge of balancing reasoning depth and efficiency in complex reasoning tasks. Although traditional Chain-of-Thought methods improve reasoning capabilities, they lack explicit control and structured planning over the reasoning process, making it difficult to evaluate path effectiveness and to backtrack and correct errors effectively. To address this, the ACL 2025 main conference paper proposes the SWAP framework, which reconceptualizes reasoning as a structure-aware planning problem.

## Core Architecture of SWAP Framework: Collaboration Between Generator and Discriminator

The SWAP framework is based on classical AI planning theory and reinforcement learning methods, consisting of two core components: generator and discriminator.

### Three Roles of the Generator
- **Policy Model (M_π)** : Generates optimal reasoning plans and plans path structures;
- **World Model (M_wm)** : Predicts the state after action execution, updates the implication graph, and achieves result foresight;
- **Controller (M_c)** : Decides whether to continue reasoning or output the answer, improving process controllability.

### Evaluation Mechanism of the Discriminator
Evaluates candidate reasoning trajectories, filters paths worth exploring in depth, and avoids waste of invalid resources.

## Formal Description of SWAP Reasoning Process

Given a goal G and initial state (s₀, g₀), the SWAP reasoning process can be formally described as follows:
1. **Planning Phase**: The policy model generates an optimized reasoning plan H;
2. **Iterative Execution Phase**:
   - The policy model proposes an action a_t based on the goal, plan, and current state;
   - The world model predicts the next state s_{t+1} and updates the implication graph g_{t+1};
   - The controller decides to continue or terminate reasoning based on the updated state.

## Unique Advantages of Structure-Aware Planning

SWAP uses a graph structure (implication graph) to represent reasoning states, which has unique advantages over traditional linear text sequences:
1. Naturally captures the branching and merging relationships of reasoning, adapting to the dependency structures of mathematical proofs and logical reasoning;
2. Facilitates backtracking and correction: can locate and correct nodes in the graph without regenerating the entire reasoning chain;
3. Improves interpretability: understands reasoning logic through visualizing the implication graph.

## Experimental Validation: Performance Improvement on Multiple Reasoning Benchmarks

SWAP performs excellently on multiple reasoning benchmarks:
- **Mathematical Reasoning**: Reduces chain failures caused by early errors in the GSM8K benchmark, with significant performance improvement;
- **Logical Reasoning**: In the FOLIO task, the implication graph aligns with the logical structure, accurately tracking the chain of premises and conclusions;
- **Adaptive Reasoning**: Adjusts depth according to problem difficulty—converges quickly for simple problems and explores deeply for complex ones.
It covers tasks such as mathematics (GSM8K, MATH), logic (FOLIO, ReClor), and programming (HumanEval, MBPP).

## Open-Source Resources: Promoting Reproducibility and Extension

The research team provides complete open-source resources:
- The codebase includes training scripts (supervised fine-tuning SFT for generator/discriminator), evaluation scripts, and pre-trained model weights;
- Datasets (trajectory data, process supervision annotations) are released on Hugging Face;
- Supports distributed training, and uses vLLM to accelerate reasoning in evaluation, improving efficiency.
Open-source promotes reproducibility and provides a foundation for subsequent research.

## Future Implications and Conclusion

### Future Research Implications
- Draw inspiration from classical AI planning to explore the deep integration of reasoning and planning;
- Build more precise and general world models, optimizing their combination with pre-trained models;
- Deepen the collaboration mechanism between generator and discriminator to simulate human deliberative processes.

### Conclusion
The SWAP framework provides a new paradigm for language model reasoning through the innovative combination of structure-aware planning and world models, and has been recognized by the ACL 2025 main conference. Its improved reasoning capabilities will drive language models to approach human intelligence levels in complex cognitive tasks.