# DiScO: Enhancing Reasoning Capabilities of Large Language Models via Diverse Thinking Schemata

> This article introduces the DiScO framework, which enhances the diversity of thinking schemata through reinforcement learning, enabling large language models to perform better on mathematical reasoning tasks and recover more effectively from erroneous attempts.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-08T03:17:31.000Z
- 最近活动: 2026-06-09T02:49:07.172Z
- 热度: 127.5
- 关键词: 大语言模型, 推理模型, 思维图式, 强化学习, 策略优化, 数学推理, 多样性, DiScO
- 页面链接: https://www.zingnex.cn/en/forum/thread/disco-2e8ac7a3
- Canonical: https://www.zingnex.cn/forum/thread/disco-2e8ac7a3
- Markdown 来源: floors_fallback

---

## DiScO Framework: Enhancing Reasoning Capabilities of Large Language Models via Diverse Thinking Schemata (Introduction)

- This article introduces the DiScO (Diverse Schemata Policy Optimization) framework, which aims to enhance the diversity of thinking schemata through reinforcement learning, improve the performance of large language models on mathematical reasoning tasks, and strengthen their ability to recover from erroneous attempts.
- Source information: Original authors are arXiv authors, source platform is arXiv, original title is "Diverse Thinking Schemata Elicit Better Reasoning in Large Language Models", link: http://arxiv.org/abs/2606.08974v1, publication time: 2026-06-08T03:17:31Z.
- Core value: Reveals scaling diversity as an effective path to enhance model capabilities, providing new ideas for the design of next-generation reasoning models.

## Research Background: The Rise of Reasoning Models and the Diversity Bottleneck

In recent years, large reasoning models (LRMs) have performed well in solving complex mathematical problems, improving accuracy by generating reasoning chains. However, current mainstream training methods (such as GRPO) focus on the correctness of the final answer and ignore the diversity of the reasoning process. Studies have found that models that can generate diverse reasoning paths have stronger problem-solving abilities and robustness; the core issue is how to systematically enhance reasoning diversity.

## Core Concepts: Two Key Dimensions of Thinking Schemata

This article proposes the "thinking schemata" framework, which describes two dimensions of the reasoning process:
1. **Reasoning Transition**: The transition method between reasoning steps (e.g., induction to deduction, trial-and-error to verification). Its quality and diversity affect the flexibility and depth of reasoning.
2. **Answer Candidates**: Different solution paths explored during reasoning. Parallel exploration of multiple paths helps select the optimal solution.
The diversity of thinking schemata is significantly positively correlated with model performance.

## DiScO Framework: Three-Stage Diversity Enhancement Strategy

The DiScO framework enhances the diversity of thinking schemata through three stages:
1. **Schema Awareness**: Train the model to recognize and distinguish different thinking schemata, laying the foundation for subsequent optimization.
2. **Diversity Reinforcement Learning**: Introduce a diversity reward mechanism; in addition to correctness rewards, the model receives extra rewards for generating different reasoning paths, encouraging exploration of a broader reasoning space.
3. **Diversity During Reasoning**: Use techniques such as temperature sampling and nucleus sampling to ensure that reasoning diversity is maintained during deployment.

## Experimental Results: Improvements in Accuracy, Error Recovery, and Robustness

Evaluation results on mathematical reasoning benchmarks:
- **Accuracy Improvement**: DiScO consistently outperforms the traditional GRPO method, showing stable advantages across multiple datasets.
- **Error Recovery Capability**: Manual annotation analysis shows that DiScO significantly improves the model's ability to recover from erroneous initial attempts, with self-correction and strategy adjustment capabilities.
- **Robustness Verification**: It shows stronger robustness when facing out-of-distribution problems, verifying the value of diverse thinking schemata.

## Technical Details: Diversity Measurement and Training Stability

- **Diversity Measurement**: Uses a comprehensive indicator of edit distance of reasoning paths and semantic similarity to accurately reflect the true diversity of the reasoning process.
- **Training Stability**: Maintains training stability while ensuring diversity goals through adaptive weight adjustment and gradient clipping techniques.
- **Computational Efficiency**: Diversity evaluation is mainly performed during the policy sampling phase, resulting in limited additional computational overhead.

## Research Significance and Future Directions

- **Research Significance**: Beyond the field of mathematical reasoning, it reveals that scaling diversity is an effective path to enhance model capabilities. Future reasoning models should pursue "diverse reasoning paths" rather than just "longer reasoning chains".
- **Cross-Domain Potential**: The concept of thinking schemata is applicable to complex reasoning fields such as code generation, scientific discovery, and creative writing.
- **Open Issues**: Issues such as the optimal level of diversity, cross-task transfer, and conflicts between diversity and consistency need further exploration.
- **Conclusion**: DiScO opens a new path for improving the reasoning capabilities of large language models; cultivating diverse reasoning abilities is key to building robust intelligent agents.