Zing Forum

Reading

CoT-Flow: Reshaping the Reasoning Paradigm of Large Language Models with Probabilistic Flow

The ACL 2026 paper CoT-Flow reconceptualizes discrete reasoning steps as continuous probabilistic flows, quantifies the contribution of each step to the correct answer via Probabilistic Flow Progress (PFP), and achieves inference acceleration without additional training and reinforcement learning alignment based on dense rewards.

CoT-Flow思维链概率流推理ACL 2026大语言模型推理优化强化学习稠密奖励贪心解码
Published 2026-04-16 20:08Recent activity 2026-04-16 20:21Estimated read 5 min
CoT-Flow: Reshaping the Reasoning Paradigm of Large Language Models with Probabilistic Flow
1

Section 01

CoT-Flow: Reshaping the Reasoning Paradigm of Large Language Models (Introduction)

This article introduces the ACL 2026 accepted paper CoT-Flow, whose core is to transform discrete reasoning steps into continuous probabilistic flows and quantify the contribution of each step to the correct answer via Probabilistic Flow Progress (PFP). This method achieves two major breakthroughs: inference acceleration without additional training, and reinforcement learning alignment based on dense rewards.

2

Section 02

Background: The Granularity Dilemma of Chain-of-Thought Reasoning

Current Chain-of-Thought (CoT) reasoning in LLMs has limitations: intermediate steps are discrete sequences, and there is a lack of mechanisms to quantify the information gain of each step. This leads to lengthy reasoning, high computational resource consumption, and sparse reward signals during training, making it difficult to achieve fine-grained alignment and optimization.

3

Section 03

Core Innovation of CoT-Flow: Probabilistic Flow Reasoning Framework

CoT-Flow proposes a unified framework that reconstructs discrete reasoning steps into continuous probabilistic flows. The core concept, Probabilistic Flow Progress (PFP), can quantify the contribution of each step to the correct answer. This framework has dual capabilities: using greedy flow decoding to select efficient paths during inference, and leveraging the accumulative nature of probabilistic flows to construct dense reward functions during training.

4

Section 04

Implementation Path 1: Training-Independent Greedy Flow Decoding

This module can extract efficient reasoning paths without additional training. By selecting tokens with high PFP scores, the system can find the shortest semantic path to the answer without external validators. Based on the SGLang framework, users can experience the acceleration effect by installing dependencies and running shell scripts.

5

Section 05

Implementation Path 2: Flow-Based Reinforcement Learning

This module integrates CoT-Flow into the reinforcement learning loop. It uses the accumulative nature of probabilistic flows to generate dense rewards, penalize redundant steps, and robustly align strategies. Based on the oat framework (referencing the VeriFree approach), dense rewards provide more fine-grained feedback than sparse rewards, making strategy optimization more stable and efficient.

6

Section 06

Experimental Validation: Balance Between Efficiency and Performance

In benchmark tests such as AIME 2024 and MATH-500, CoT-Flow achieves an excellent balance between inference efficiency and performance. The results show that while maintaining or even improving accuracy, it significantly reduces the number of reasoning steps, which is of great significance for LLM deployment in resource-constrained scenarios.

7

Section 07

Technical Implementation and Open-Source Contributions

The codebase is divided into two sub-projects: cot-flow-greedy-decoding/ (inference optimization module) and cot-flow-rl/ (RL training module), with a modular design for easy reuse. The paper has been published on arXiv (2601.09260) and accepted by ACL 2026, and its open-source release provides new research directions and tools for the community.

8

Section 08

Conclusion: Significant Progress in CoT Reasoning

CoT-Flow is a significant progress in chain-of-thought reasoning research, solving the problem of inference efficiency and providing new possibilities for RL alignment. It is a project worth paying attention to for researchers and engineers focusing on LLM reasoning optimization, efficient path search, and RL alignment.