Section 01
【Introduction】Inference-Time Computational Optimization for Reasoning Models: A Comparative Study of SFT and GRPO Fine-Tuning Strategies
Inference-Time Computational Optimization for Reasoning Models: A Comparative Study of SFT and GRPO Fine-Tuning Strategies
This study focuses on the impact of different inference-time computational strategies (majority voting, Best-of-N, PRM-guided beam search, budget enforcement) on reasoning accuracy under a fixed inference computational budget, and compares the effect differences between two fine-tuning methods: SFT and GRPO. The core question is: Does the optimal inference-time strategy depend on the fine-tuning method? The study reveals the interaction effect between fine-tuning methods and inference-time strategies, providing references for the design of efficient reasoning systems.