# Small_Scale: Pruning Long Chain-of-Thought in Large Reasoning Models via Small-Scale Preference Optimization

> The Small_Scale project provides the official implementation of the ICLR 2026 paper, including a complete LLM offline inference evaluation toolkit and DPO training framework, supporting vLLM/SGLang backends, multi-type benchmark tests, and preference optimization training based on LLaMA-Factory.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-31T06:05:31.000Z
- 最近活动: 2026-03-31T06:26:17.592Z
- 热度: 154.7
- 关键词: LLM, reasoning, chain-of-thought, pruning, preference optimization, DPO, vLLM, SGLang, evaluation, ICLR
- 页面链接: https://www.zingnex.cn/en/forum/thread/small-scale
- Canonical: https://www.zingnex.cn/forum/thread/small-scale
- Markdown 来源: floors_fallback

---

## Introduction to the Small_Scale Project

Small_Scale is the official open-source implementation of the ICLR 2026 paper *Pruning Long Chain-of-Thought in Large Reasoning Models via Small-Scale Preference Optimization*. It aims to prune long chain-of-thought in large reasoning models through small-scale preference optimization, addressing the issue of high computational overhead. The project provides a complete LLM offline inference evaluation toolkit and DPO training framework, supporting vLLM/SGLang backends, multi-type benchmark tests, and preference optimization training based on LLaMA-Factory, thus offering infrastructure for research and development of reasoning models.

## Research Background and Challenges

Large reasoning models solve complex problems via long chain-of-thought, but excessive reasoning leads to huge computational overhead and latency, limiting practical deployment efficiency. Traditional methods require extensive data fine-tuning or retraining, which are resource-intensive. The core insight of Small_Scale is: through small-scale preference optimization, redundant chain-of-thought content can be effectively pruned without sacrificing reasoning quality.

## Project Overview and Toolkit Architecture

Small_Scale is the official implementation of the ICLR 2026 paper, accompanied by a fully functional LLM evaluation and training toolkit that supports the complete workflow. The toolkit adopts a modular architecture:
- Configuration layer (config/): Manages global paths, dataset metadata, and other configurations;
- Data layer (data/test/): Built-in with three categories of authoritative benchmark datasets (in parquet format): mathematics, code, and multiple-choice questions;
- Inference layer (eval/generation/): Supports vLLM (multi-process/random shuffle/single-process) and SGLang backends;
- Evaluation layer (eval/judgers/): Implements dedicated judges for mathematics, code, and multiple-choice questions, as well as the LLM-as-Judge mode;
- Training layer (LLaMA-Factory/): Integrated framework supports DPO training and DeepSpeed ZeRO-3 configuration.

## Detailed Explanation of Core Features

1. **Flexible Inference Backends**: Supports vLLM (multi-process data parallelism/random shuffle/single process) and SGLang, adapting to different scenarios;
2. **Comprehensive Benchmark Tests**: Covers mathematics (AIME/GSM8K, etc.), code (LiveCodeBench), and multiple-choice (MMLU, etc.) tasks, using corresponding evaluation metrics;
3. **Automated Evaluation**: The autojudger module automatically identifies tasks, calls judges, calculates scores, and records logs;
4. **End-to-End Pipeline**: The output path of the inference script is written to a temporary file, enabling seamless integration between inference and evaluation.

## Usage Instructions

- **Environment Preparation**: Configure the path in config/path.yaml, place model weights, and depend on Python 3.10+ and related libraries;
- **Inference Evaluation**: Take vLLM multi-process as an example: `python eval/generation/vllm_offline.py --config ... --model_name ... --dataset_name ...`;
- **Automated Evaluation**: `python eval/judgers/autojudger.py --config ... --file_path ...`;
- **DPO Training**: After configuring dpo.yaml, start with: `export CUDA_VISIBLE_DEVICES=...; llamafactory-cli train ...`.

## Technical Highlights and Application Scenarios

**Technical Highlights**:
1. Data Parallelism Optimization: vLLM multi-process sharding improves throughput efficiency, supporting random shuffle to eliminate bias;
2. Flexible Sampling Configuration: Unified parameter structure, adjustable temperature/top_p, etc., supporting advanced configurations like tensor parallelism;
3. LLM-as-Judge: Supports calling OpenAI API and others for intelligent evaluation of complex outputs.

**Application Scenarios**:
1. Research on Pruning of Reasoning Models: Provides experimental infrastructure;
2. Model Selection Comparison: Obtains comparable metrics through standardized benchmark tests;
3. Continuous Integration Monitoring: Easy to integrate into CI/CD pipelines, supporting version regression testing.

## Academic Contributions and Summary

**Academic Contributions**: The paper corresponding to the project was accepted by ICLR 2026, proposing a method to prune long chain-of-thought via small-scale preference optimization, balancing reasoning ability and efficiency.

**Summary**: Small_Scale is not only an implementation of the paper but also a fully functional LLM evaluation and training infrastructure. Designs such as modular architecture and multi-backend support lower the research threshold and promote the progress of reasoning model technology.
