# VibeThinker: The Big Logic of Small Models—Diversity-Driven Optimization Unleashes Large Model-Level Reasoning Capabilities

> Weibo AI's open-source VibeThinker-1.5B/3B achieves cutting-edge reasoning capabilities at an extremely low cost (only $7,800), outperforming models like DeepSeek R1 with 400x more parameters on math competition benchmarks such as AIME and HMMT, and proposes the Spectrum-to-Signal Principle (SSP) training paradigm.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-16T04:14:45.000Z
- 最近活动: 2026-06-16T04:25:32.893Z
- 热度: 150.8
- 关键词: 推理模型, 小语言模型, 知识蒸馏, 强化学习, 数学推理, 代码生成, VibeThinker, reasoning model
- 页面链接: https://www.zingnex.cn/en/forum/thread/vibethinker
- Canonical: https://www.zingnex.cn/forum/thread/vibethinker
- Markdown 来源: floors_fallback

---

## VibeThinker: The Big Logic of Small Models—Achieving Large Model-Level Reasoning Capabilities at Low Cost

Weibo AI's open-source VibeThinker series small models (1.5B/3B) challenge the traditional perception that "only large models have strong reasoning capabilities" with extremely low training costs (1.5B costs only $7,800). Through the innovative Spectrum-to-Signal Principle (SSP) training paradigm, they outperform large models like DeepSeek R1 (with 400x more parameters) on math competitions such as AIME and HMMT, as well as programming tasks, demonstrating the reasoning potential of small models.

## Project Background and Core Breakthroughs

VibeThinker was developed by the Weibo AI team. The 1.5B version was first open-sourced in November 2025, and the 3B version was released in June 2026. The core breakthrough of the project lies in achieving reasoning performance that surpasses large models at an extremely low cost (the 1.5B training cost is $7,800, which is 30-60 times lower than DeepSeek R1's $294K), redefining the economics of reasoning models. The base model uses the Qwen2.5-Coder series, leveraging the verifiability of code data to cultivate rigorous reasoning capabilities.

## Core Technologies: SSP Training Paradigm and CLR Strategy

The core innovation of VibeThinker is the SSP training paradigm:
1. **Diversity Exploration Distillation**: Generate diverse reasoning trajectory "spectra" during the SFT phase to ensure the model covers multiple problem-solving approaches;
2. **Signal Amplification**: Strengthen correct "signals" from the spectra via MaxEnt-Guided Policy Optimization (MGPO) during the RL phase.
The 3B version upgrades the SSP process (enhanced data synthesis, multi-domain RL, long context retention, etc.) and introduces the Claim-Level Reliability Assessment (CLR) strategy, which performs reliability assessment on each claim during reasoning to correct errors, significantly improving accuracy.

## Performance Evidence: Practical Comparison Between Small and Large Models

Practical performance data verifies the superiority of small models:
- **1.5B version**: AIME24 (80.3 vs DeepSeek R1's 79.8), HMMT25 (50.4 vs 41.7), outperforming DeepSeek R1 with 400x more parameters;
- **3B version**: AIME26 (94.3→97.1 with CLR), HMMT25 (89.3→95.4 with CLR), LiveCodeBench v6 (80.2 Pass@1), 96.1% acceptance rate on recent LeetCode problems, achieving cutting-edge performance.

## Application Scenarios and Limitations

**Recommended Scenarios**: Competition-level math problems (AIME/HMMT), programming competitions (LeetCode/LiveCodeBench), STEM reasoning, instruction-following tasks;
**Limitations**: Not suitable for broad open-domain knowledge tasks; advantages are concentrated on verifiable reasoning tasks.
Recommended reasoning configurations: temperature 0.6/1.0, top_p=0.95, top_k=-1, max_tokens=40960.

## Open-Source Contributions and Future Directions

**Open-Source Contributions**: Provide a cost-effective reasoning model development path, reference implementation of the SSP paradigm, complete evaluation toolchain, detailed hyperparameter configurations; ranked first on the trending list after release on Hugging Face;
**Technical Insights**: Data quality is better than scale, diversity is key to discovery, focus on optimization of verifiable tasks;
**Future Directions**: Extend SSP to larger models, cross-model application of CLR strategy, multi-domain verifiable task processes, new paradigm of small models + external tools.

## Conclusion: A New Paradigm for Small Model Reasoning

VibeThinker, through the SSP paradigm and extremely low cost, proves the great potential of small models in reasoning tasks, challenging the traditional concept that "scale determines capability". It provides a cost-effective path for developing high-performance reasoning models for researchers/developers with limited resources, promoting innovation in the field of small model reasoning.
