章节 01
KnapsackRL: Optimizing LLM RL Exploration Budget with Knapsack Problem
KnapsackRL is an innovative project that combines the classic knapsack problem with reinforcement learning (RL) to optimize exploration budget allocation for large language models (LLMs). It addresses the core challenge of resource-limited LLM RL training by mapping exploration resources to knapsack capacity and candidate trajectories to items, aiming to maximize learning efficiency while minimizing resource waste.