Section 01
KnapsackRL: Optimizing Exploration Budget Allocation in LLM Reinforcement Learning Using the Knapsack Problem (Introduction)
This article introduces the KnapsackRL project, whose core is applying classic knapsack problem algorithms to exploration budget allocation in reinforcement learning (RL) to solve the exploration-exploitation dilemma in large language model (LLM) training. Due to the enormous search space in LLM training, efficiently exploring high-quality trajectories under limited resources is a key bottleneck. KnapsackRL models the problem using the knapsack approach to optimize resource allocation, improving training efficiency and model performance.