Section 01
[Introduction] KBQA-R1: A New Framework for Knowledge Base Question Answering Empowering Large Language Models with Reinforcement Learning
KBQA-R1 is a reinforcement learning-based knowledge base question answering (KBQA) framework. Its core is modeling KBQA as a multi-turn Markov Decision Process (MDP) and combining it with the Group Relative Policy Optimization (GRPO) strategy, achieving significant improvements on the WebQSP and GrailQA datasets. This framework includes key innovations such as action-centric design, Reference Rejection Sampling (RRS) data synthesis, and a four-stage training pipeline, providing a new paradigm for the interaction between large language models (LLMs) and external knowledge bases.