Zing Forum

Reading

NeurIPS 2025 Paper Supporting Data Released: Unveiling Termination Misalignment in Large Reasoning Models

A research dataset targeting the termination misalignment problem in Large Reasoning Models (LRMs), including systematic evaluation results to help researchers understand when and why models stop reasoning.

大推理模型终止不对齐NeurIPSChain-of-Thoughto1DeepSeek-R1模型评估推理优化
Published 2026-04-25 17:43Recent activity 2026-04-25 17:48Estimated read 6 min
NeurIPS 2025 Paper Supporting Data Released: Unveiling Termination Misalignment in Large Reasoning Models
1

Section 01

NeurIPS 2025 Paper Supporting Data Released: Focus on Termination Misalignment in Large Reasoning Models

The supporting dataset trm-data-neurips for the NeurIPS 2025 accepted paper Termination Misalignment in Large Reasoning Models is officially released, aiming to unveil the termination misalignment problem in Large Reasoning Models (LRMs). This dataset includes systematic evaluation results to help researchers understand when and why models stop reasoning, providing benchmark support for subsequent model optimization and research.

2

Section 02

Research Background: Rise of Reasoning Models and Neglect of Termination Issues

With the rise of reasoning models like OpenAI o1 and DeepSeek-R1, large language models have demonstrated human-like Chain-of-Thought capabilities, significantly improving performance in tasks such as mathematics and programming. However, the critical question of when models should stop thinking is often overlooked, becoming a potential bottleneck in the development of current reasoning models.

3

Section 03

Definition of Termination Misalignment: Three Core Manifestations

Termination misalignment refers to the inconsistency between a model's internal reasoning process and its final output, with specific manifestations as follows:

  1. Premature termination: Giving a conclusion without fully exploring solutions
  2. Over-reasoning: Continuing unnecessary computations even after finding the correct answer
  3. Disconnection between reasoning and conclusion: Logical mismatch between intermediate steps and the final answer This phenomenon affects model efficiency and even leads to incorrect outputs.
4

Section 04

Dataset Content: Multi-Model and Multi-Dimensional Evaluation Data

The dataset trm-data-neurips includes:

  • Multi-model comparison: Covers OpenAI o1 series, DeepSeek-R1 and its variants, QwQ, and other open-source models
  • Multi-dimensional scenarios: Mathematical reasoning (AIME/AMC), code generation, logic puzzles, scientific Q&A
  • Fine-grained metrics: Relationship between number of reasoning steps and correct answers, correlation between termination timing and difficulty, impact of prompt strategies on termination behavior
5

Section 05

Research Significance: Dual Value for Developers and the Community

Insights for Developers:

  1. Training strategy: Need to introduce fine-grained reward mechanisms to reward efficient reasoning processes
  2. Reasoning control: Replace fixed thinking budgets with dynamic termination mechanisms
  3. Interpretability: Understanding termination behavior improves model interpretability Contributions to the Community: Provides benchmarks to support the development of termination judgment algorithms, systematic model comparisons, and reliable model training.
6

Section 06

Practical Applications: Cost Optimization and Fine-Tuning Guidance

API Cost Optimization: Understanding termination misalignment can reduce token consumption from over-reasoning and lower hidden error costs Model Fine-Tuning Guidance: Helps design reasonable reasoning length reward functions, develop early stopping detection mechanisms, and optimize prompts to guide appropriate termination timing.

7

Section 07

Dataset Usage Guide: Reproduction, Expansion, and Innovation

Researchers can use the dataset to:

  1. Reproduce paper results to validate original findings
  2. Expand evaluations by adding new models or test scenarios
  3. Develop better termination judgment models
  4. Compare their own models with existing benchmarks.
8

Section 08

Conclusion: Importance of Termination Misalignment and Future Outlook

Termination misalignment is an important but under-researched topic in the field of reasoning models. As models are increasingly applied in critical areas such as healthcare and law, ensuring that models give the correct answer at the right time is crucial. We look forward to the community developing more intelligent and reliable reasoning models based on this dataset.