Reading

ReasoningFlow: Uncovering the Hidden Logic of Large Language Model Reasoning Processes Using Discourse Structure Graphs

大语言模型推理轨迹可解释性有向无环图思维链模型评估话语结构DeepSeekQwen

Published 2026-06-04 04:12Recent activity 2026-06-05 16:51Estimated read 7 min

ReasoningFlow: Uncovering the Hidden Logic of Large Language Model Reasoning Processes Using Discourse Structure Graphs

Section 01

Introduction: ReasoningFlow – A DAG Framework for Analyzing LLM Reasoning Trajectories

ReasoningFlow is a framework that captures the reasoning trajectories of large language models (LLMs) as directed acyclic graphs (DAGs). By analyzing 1260 reasoning trajectories (247,000 steps), it reveals the structural similarities in reasoning across different models and the complex relationship between erroneous steps and final answers. This framework aims to address challenges in the reasoning process of large reasoning models (LRMs), such as interpretability dilemmas, monitoring difficulties, and the lack of cross-model comparisons.

Section 02

Research Background and Challenges

Large reasoning models (e.g., DeepSeek-R1, QwQ-32B) solve complex problems by generating reasoning trajectories that include non-linear thinking processes like hypothesis proposal, verification, backtracking, and self-correction. However, there are three major challenges:

Interpretability Dilemma: Traditional linear evaluation struggles to capture branching, looping, and correction behaviors in reasoning;
Monitoring Difficulty: Lack of a systematic analysis framework to understand the impact of erroneous steps on final answers;
Cross-Model Comparison: Uncertainty about whether there are commonalities or differences in reasoning processes between models with different architectures/training data.

Section 03

ReasoningFlow Framework and Technical Implementation

The ReasoningFlow framework models reasoning trajectories as DAGs, drawing on the concept of linguistic discourse structure, where steps are nodes and logical relationships are edges. Core design principles:

Non-linear modeling: Express complex patterns like hypothesis testing and backtracking;
Fine-grained analysis: Track step contributions and dependencies;
Computability: Support automated analysis via graph algorithms. Technical implementation includes:

DAG construction algorithms (step segmentation, relationship identification, graph building, attribute annotation);
Visualization tools (interactive exploration, statistical summaries, comparative analysis of reasoning paths across different models).

Section 04

Data Construction and Annotation Process

Data construction is divided into two phases:

Manual Annotation Validation: 31 reasoning trajectories (about 2100 steps), where professional annotators label step function types, dependencies, errors, etc., and consistency checks ensure the reliability of the scheme;
Large-Scale Automatic Annotation: An automated process developed based on manual paradigms, applied to 1260 trajectories (247,700 steps), covering three domains: mathematical reasoning, scientific Q&A, and argumentation analysis, as well as models like Qwen2.5-32B-Inst and DeepSeek-R1.

Section 05

Key Research Findings

Main findings:

Similarity in Model Reasoning Structures: The reasoning trajectory structures of models with different architectures/training data are surprisingly similar, suggesting convergence of reasoning capabilities and architecture independence;
Diversity in Fine-Grained Reasoning Behaviors: Patterns like local verification, self-reflection, and hypothesis management exist;
Relationship Between Erroneous Steps and Answers: Most erroneous steps are not used in the final answer, reflecting the model's fault tolerance and the limitations of traditional evaluation;
Separation of Causal Dependencies and Discourse Structure: Mechanical causal dependencies may not be reflected at the linguistic level; evaluation needs to consider both logical correctness and expressive coherence.

Section 06

Application Prospects and Impact

Application directions:

Model Evaluation and Improvement: Evaluate reasoning efficiency, diagnose erroneous steps, optimize training data;
Interpretability Enhancement: Reasoning auditing (tracking conclusion paths), confidence estimation, adversarial detection;
Human-Machine Collaboration Optimization: Identify intervention points, guide reasoning directions, integrate human knowledge.

Section 07

Open-Source Resources and Future Directions

Open-Source Resources: Dataset (1260 DAG-annotated trajectories), annotation tools, visualization tools, analysis library, available at: https://github.com/jinulee-v/reasoningflow. Limitations: Language constraints (English-dominated), narrow task scope (not covering creative writing, etc.), automatic annotation accuracy needs improvement. Future Directions: Multilingual expansion, real-time reasoning monitoring, reasoning strategy learning, neuro-symbolic fusion.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49