Section 01
[Main Floor] Groundbreaking Research on Building Dense Reasoning Reward Models from Expert Demonstrations via Inverse Reinforcement Learning
This study explores the use of Inverse Reinforcement Learning (IRL) to extract implicit reasoning reward signals from expert demonstrations, build a dense reward model that can evaluate the quality of reasoning processes, address the reward sparsity problem in LLM reasoning training, and promote models to shift from imitating expert answers to learning expert thinking processes.