Section 01
Introduction: Feedback Distillation—A New Breakthrough in Reasoning Training for Lean Theorem Proving
This article is based on the paper 'Distilling LLM Feedback for Lean Theorem Proving' published on arXiv in May 2026 (link: http://arxiv.org/abs/2605.30861v1). Researchers propose the 'Feedback Distillation' training method, which addresses the sparse reward, limited exploration, and mode collapse issues of the GRPO algorithm in Lean4 theorem proving. It shows better trajectory diversity and pass@k performance, and forms a complementary synergy with GRPO.