# ReasoningFlow: Uncovering the Hidden Logic of Large Language Model Reasoning Processes Using Discourse Structure Graphs

> ReasoningFlow is a framework that captures the reasoning trajectories of large language models (LLMs) as directed acyclic graphs (DAGs). By analyzing 1260 reasoning trajectories (247,000 steps), it reveals the structural similarities in reasoning across different models and the complex relationship between erroneous steps and final answers.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-03T20:12:26.000Z
- 最近活动: 2026-06-05T08:51:03.802Z
- 热度: 116.4
- 关键词: 大语言模型, 推理轨迹, 可解释性, 有向无环图, 思维链, 模型评估, 话语结构, DeepSeek, Qwen
- 页面链接: https://www.zingnex.cn/en/forum/thread/reasoningflow
- Canonical: https://www.zingnex.cn/forum/thread/reasoningflow
- Markdown 来源: floors_fallback

---

## Introduction: ReasoningFlow – A DAG Framework for Analyzing LLM Reasoning Trajectories

ReasoningFlow is a framework that captures the reasoning trajectories of large language models (LLMs) as directed acyclic graphs (DAGs). By analyzing 1260 reasoning trajectories (247,000 steps), it reveals the structural similarities in reasoning across different models and the complex relationship between erroneous steps and final answers. This framework aims to address challenges in the reasoning process of large reasoning models (LRMs), such as interpretability dilemmas, monitoring difficulties, and the lack of cross-model comparisons.

## Research Background and Challenges

Large reasoning models (e.g., DeepSeek-R1, QwQ-32B) solve complex problems by generating reasoning trajectories that include non-linear thinking processes like hypothesis proposal, verification, backtracking, and self-correction. However, there are three major challenges:
1. **Interpretability Dilemma**: Traditional linear evaluation struggles to capture branching, looping, and correction behaviors in reasoning;
2. **Monitoring Difficulty**: Lack of a systematic analysis framework to understand the impact of erroneous steps on final answers;
3. **Cross-Model Comparison**: Uncertainty about whether there are commonalities or differences in reasoning processes between models with different architectures/training data.

## ReasoningFlow Framework and Technical Implementation

The ReasoningFlow framework models reasoning trajectories as DAGs, drawing on the concept of linguistic discourse structure, where steps are nodes and logical relationships are edges. Core design principles:
- Non-linear modeling: Express complex patterns like hypothesis testing and backtracking;
- Fine-grained analysis: Track step contributions and dependencies;
- Computability: Support automated analysis via graph algorithms.
Technical implementation includes:
1. DAG construction algorithms (step segmentation, relationship identification, graph building, attribute annotation);
2. Visualization tools (interactive exploration, statistical summaries, comparative analysis of reasoning paths across different models).

## Data Construction and Annotation Process

Data construction is divided into two phases:
1. **Manual Annotation Validation**: 31 reasoning trajectories (about 2100 steps), where professional annotators label step function types, dependencies, errors, etc., and consistency checks ensure the reliability of the scheme;
2. **Large-Scale Automatic Annotation**: An automated process developed based on manual paradigms, applied to 1260 trajectories (247,700 steps), covering three domains: mathematical reasoning, scientific Q&A, and argumentation analysis, as well as models like Qwen2.5-32B-Inst and DeepSeek-R1.

## Key Research Findings

Main findings:
1. **Similarity in Model Reasoning Structures**: The reasoning trajectory structures of models with different architectures/training data are surprisingly similar, suggesting convergence of reasoning capabilities and architecture independence;
2. **Diversity in Fine-Grained Reasoning Behaviors**: Patterns like local verification, self-reflection, and hypothesis management exist;
3. **Relationship Between Erroneous Steps and Answers**: Most erroneous steps are not used in the final answer, reflecting the model's fault tolerance and the limitations of traditional evaluation;
4. **Separation of Causal Dependencies and Discourse Structure**: Mechanical causal dependencies may not be reflected at the linguistic level; evaluation needs to consider both logical correctness and expressive coherence.

## Application Prospects and Impact

Application directions:
- **Model Evaluation and Improvement**: Evaluate reasoning efficiency, diagnose erroneous steps, optimize training data;
- **Interpretability Enhancement**: Reasoning auditing (tracking conclusion paths), confidence estimation, adversarial detection;
- **Human-Machine Collaboration Optimization**: Identify intervention points, guide reasoning directions, integrate human knowledge.

## Open-Source Resources and Future Directions

**Open-Source Resources**: Dataset (1260 DAG-annotated trajectories), annotation tools, visualization tools, analysis library, available at: https://github.com/jinulee-v/reasoningflow.
**Limitations**: Language constraints (English-dominated), narrow task scope (not covering creative writing, etc.), automatic annotation accuracy needs improvement.
**Future Directions**: Multilingual expansion, real-time reasoning monitoring, reasoning strategy learning, neuro-symbolic fusion.
