Zing Forum

Reading

ReasoningFlow: Uncovering the Hidden Logic of Large Language Model Reasoning Processes Using Discourse Structure Graphs

ReasoningFlow is a framework that captures the reasoning trajectories of large language models (LLMs) as directed acyclic graphs (DAGs). By analyzing 1260 reasoning trajectories (247,000 steps), it reveals the structural similarities in reasoning across different models and the complex relationship between erroneous steps and final answers.

大语言模型推理轨迹可解释性有向无环图思维链模型评估话语结构DeepSeekQwen
Published 2026-06-04 04:12Recent activity 2026-06-05 16:51Estimated read 7 min
ReasoningFlow: Uncovering the Hidden Logic of Large Language Model Reasoning Processes Using Discourse Structure Graphs
1

Section 01

Introduction: ReasoningFlow – A DAG Framework for Analyzing LLM Reasoning Trajectories

ReasoningFlow is a framework that captures the reasoning trajectories of large language models (LLMs) as directed acyclic graphs (DAGs). By analyzing 1260 reasoning trajectories (247,000 steps), it reveals the structural similarities in reasoning across different models and the complex relationship between erroneous steps and final answers. This framework aims to address challenges in the reasoning process of large reasoning models (LRMs), such as interpretability dilemmas, monitoring difficulties, and the lack of cross-model comparisons.

2

Section 02

Research Background and Challenges

Large reasoning models (e.g., DeepSeek-R1, QwQ-32B) solve complex problems by generating reasoning trajectories that include non-linear thinking processes like hypothesis proposal, verification, backtracking, and self-correction. However, there are three major challenges:

  1. Interpretability Dilemma: Traditional linear evaluation struggles to capture branching, looping, and correction behaviors in reasoning;
  2. Monitoring Difficulty: Lack of a systematic analysis framework to understand the impact of erroneous steps on final answers;
  3. Cross-Model Comparison: Uncertainty about whether there are commonalities or differences in reasoning processes between models with different architectures/training data.
3

Section 03

ReasoningFlow Framework and Technical Implementation

The ReasoningFlow framework models reasoning trajectories as DAGs, drawing on the concept of linguistic discourse structure, where steps are nodes and logical relationships are edges. Core design principles:

  • Non-linear modeling: Express complex patterns like hypothesis testing and backtracking;
  • Fine-grained analysis: Track step contributions and dependencies;
  • Computability: Support automated analysis via graph algorithms. Technical implementation includes:
  1. DAG construction algorithms (step segmentation, relationship identification, graph building, attribute annotation);
  2. Visualization tools (interactive exploration, statistical summaries, comparative analysis of reasoning paths across different models).
4

Section 04

Data Construction and Annotation Process

Data construction is divided into two phases:

  1. Manual Annotation Validation: 31 reasoning trajectories (about 2100 steps), where professional annotators label step function types, dependencies, errors, etc., and consistency checks ensure the reliability of the scheme;
  2. Large-Scale Automatic Annotation: An automated process developed based on manual paradigms, applied to 1260 trajectories (247,700 steps), covering three domains: mathematical reasoning, scientific Q&A, and argumentation analysis, as well as models like Qwen2.5-32B-Inst and DeepSeek-R1.
5

Section 05

Key Research Findings

Main findings:

  1. Similarity in Model Reasoning Structures: The reasoning trajectory structures of models with different architectures/training data are surprisingly similar, suggesting convergence of reasoning capabilities and architecture independence;
  2. Diversity in Fine-Grained Reasoning Behaviors: Patterns like local verification, self-reflection, and hypothesis management exist;
  3. Relationship Between Erroneous Steps and Answers: Most erroneous steps are not used in the final answer, reflecting the model's fault tolerance and the limitations of traditional evaluation;
  4. Separation of Causal Dependencies and Discourse Structure: Mechanical causal dependencies may not be reflected at the linguistic level; evaluation needs to consider both logical correctness and expressive coherence.
6

Section 06

Application Prospects and Impact

Application directions:

  • Model Evaluation and Improvement: Evaluate reasoning efficiency, diagnose erroneous steps, optimize training data;
  • Interpretability Enhancement: Reasoning auditing (tracking conclusion paths), confidence estimation, adversarial detection;
  • Human-Machine Collaboration Optimization: Identify intervention points, guide reasoning directions, integrate human knowledge.
7

Section 07

Open-Source Resources and Future Directions

Open-Source Resources: Dataset (1260 DAG-annotated trajectories), annotation tools, visualization tools, analysis library, available at: https://github.com/jinulee-v/reasoningflow. Limitations: Language constraints (English-dominated), narrow task scope (not covering creative writing, etc.), automatic annotation accuracy needs improvement. Future Directions: Multilingual expansion, real-time reasoning monitoring, reasoning strategy learning, neuro-symbolic fusion.