Zing Forum

Reading

SWAP: Reconstructing Deliberative Reasoning of Language Models into a Structure-Aware Planning Framework

The ACL 2025 main conference paper SWAP proposes a new reasoning paradigm for language models, which achieves more deliberative multi-step reasoning capabilities through the combination of structure-aware planning and precise world models.

SWAPACL 2025deliberate reasoningstructure-aware planningworld modellanguage modelsmulti-step reasoninggithub
Published 2026-04-12 03:14Recent activity 2026-04-12 03:18Estimated read 8 min
SWAP: Reconstructing Deliberative Reasoning of Language Models into a Structure-Aware Planning Framework
1

Section 01

Introduction: SWAP Framework - A New Reasoning Paradigm Combining Structure-Aware Planning and World Models

The ACL 2025 main conference paper SWAP proposes a new reasoning paradigm for language models, reconstructing the reasoning process into a structure-aware planning problem and achieving more deliberative multi-step reasoning capabilities by combining precise world models. This framework aims to address the core challenge of traditional Chain-of-Thought methods, which lack explicit control and structured planning in complex reasoning.

2

Section 02

Research Background and Motivation

Current large language models face the core challenge of balancing reasoning depth and efficiency in complex reasoning tasks. Although traditional Chain-of-Thought methods improve reasoning capabilities, they lack explicit control and structured planning over the reasoning process, making it difficult to evaluate path effectiveness and to backtrack and correct errors effectively. To address this, the ACL 2025 main conference paper proposes the SWAP framework, which reconceptualizes reasoning as a structure-aware planning problem.

3

Section 03

Core Architecture of SWAP Framework: Collaboration Between Generator and Discriminator

The SWAP framework is based on classical AI planning theory and reinforcement learning methods, consisting of two core components: generator and discriminator.

Three Roles of the Generator

  • Policy Model (M_π) : Generates optimal reasoning plans and plans path structures;
  • World Model (M_wm) : Predicts the state after action execution, updates the implication graph, and achieves result foresight;
  • Controller (M_c) : Decides whether to continue reasoning or output the answer, improving process controllability.

Evaluation Mechanism of the Discriminator

Evaluates candidate reasoning trajectories, filters paths worth exploring in depth, and avoids waste of invalid resources.

4

Section 04

Formal Description of SWAP Reasoning Process

Given a goal G and initial state (s₀, g₀), the SWAP reasoning process can be formally described as follows:

  1. Planning Phase: The policy model generates an optimized reasoning plan H;
  2. Iterative Execution Phase:
    • The policy model proposes an action a_t based on the goal, plan, and current state;
    • The world model predicts the next state s_{t+1} and updates the implication graph g_{t+1};
    • The controller decides to continue or terminate reasoning based on the updated state.
5

Section 05

Unique Advantages of Structure-Aware Planning

SWAP uses a graph structure (implication graph) to represent reasoning states, which has unique advantages over traditional linear text sequences:

  1. Naturally captures the branching and merging relationships of reasoning, adapting to the dependency structures of mathematical proofs and logical reasoning;
  2. Facilitates backtracking and correction: can locate and correct nodes in the graph without regenerating the entire reasoning chain;
  3. Improves interpretability: understands reasoning logic through visualizing the implication graph.
6

Section 06

Experimental Validation: Performance Improvement on Multiple Reasoning Benchmarks

SWAP performs excellently on multiple reasoning benchmarks:

  • Mathematical Reasoning: Reduces chain failures caused by early errors in the GSM8K benchmark, with significant performance improvement;
  • Logical Reasoning: In the FOLIO task, the implication graph aligns with the logical structure, accurately tracking the chain of premises and conclusions;
  • Adaptive Reasoning: Adjusts depth according to problem difficulty—converges quickly for simple problems and explores deeply for complex ones. It covers tasks such as mathematics (GSM8K, MATH), logic (FOLIO, ReClor), and programming (HumanEval, MBPP).
7

Section 07

Open-Source Resources: Promoting Reproducibility and Extension

The research team provides complete open-source resources:

  • The codebase includes training scripts (supervised fine-tuning SFT for generator/discriminator), evaluation scripts, and pre-trained model weights;
  • Datasets (trajectory data, process supervision annotations) are released on Hugging Face;
  • Supports distributed training, and uses vLLM to accelerate reasoning in evaluation, improving efficiency. Open-source promotes reproducibility and provides a foundation for subsequent research.
8

Section 08

Future Implications and Conclusion

Future Research Implications

  • Draw inspiration from classical AI planning to explore the deep integration of reasoning and planning;
  • Build more precise and general world models, optimizing their combination with pre-trained models;
  • Deepen the collaboration mechanism between generator and discriminator to simulate human deliberative processes.

Conclusion

The SWAP framework provides a new paradigm for language model reasoning through the innovative combination of structure-aware planning and world models, and has been recognized by the ACL 2025 main conference. Its improved reasoning capabilities will drive language models to approach human intelligence levels in complex cognitive tasks.