Zing Forum

Reading

SCPRM: A Schema-Aware Cumulative Process Reward Model for Knowledge Graph Question Answering

To address the challenge of process reward evaluation for large models in knowledge graph reasoning, this paper proposes the SCPRM model. By introducing schema distance and cumulative reward mechanisms, it effectively solves the risk compensation effect problem, achieving an average 1.18% improvement in Hits@k metrics on medical and legal knowledge graph question answering tasks.

知识图谱问答过程奖励模型累积奖励模式感知蒙特卡洛树搜索多跳推理医疗知识图谱法律AI
Published 2026-05-05 00:56Recent activity 2026-05-05 12:20Estimated read 4 min
SCPRM: A Schema-Aware Cumulative Process Reward Model for Knowledge Graph Question Answering
1

Section 01

[Overview] SCPRM: A Schema-Aware Cumulative Process Reward Model for Knowledge Graph Question Answering

This paper proposes the SCPRM model to address the challenge of process reward evaluation for large models in knowledge graph reasoning. By introducing schema distance and cumulative reward mechanisms, it effectively solves the risk compensation effect problem, achieving an average 1.18% improvement in Hits@k metrics on medical and legal knowledge graph question answering tasks.

2

Section 02

[Background] Existing Challenges in Knowledge Graph Reasoning Evaluation

In large model reasoning evaluation, traditional outcome reward models cannot guide intermediate steps; existing process reward models suffer from the risk compensation effect (incorrect intermediate steps still receive high rewards if corrected later). Knowledge Graph Question Answering (KGQA) has special challenges such as multi-path characteristics, high risk sensitivity (serious consequences of wrong paths in medical/legal fields), and schema constraints.

3

Section 03

[Methodology] Core Innovations of the SCPRM Model and Integration with MCTS

SCPRM includes two key innovations: 1. Cumulative reward mechanism: Evaluates based on reasoning prefix conditions, considering coherence between steps and history; 2. Schema distance awareness: Measures the schema conformity between steps and the implicit target of the query, distinguishing between correct detours and wrong deviations. Integrate SCPRM into the Monte Carlo Tree Search (MCTS) framework to form the SCPRM-MCTS method to guide the search process.

4

Section 04

[Experiments] Performance Verification of SCPRM-MCTS

Evaluated on medical, legal KG datasets and the general CWQ dataset: The Hits@k metric improved by an average of 1.18%; it showed significant advantages in risk-sensitive reasoning scenarios, reducing the proportion of high-risk wrong steps and improving the reliability of practical applications.

5

Section 05

[Conclusion and Recommendations] Contributions and Application Insights of SCPRM

Technical contributions: Refined process reward evaluation, optimized reasoning using schema knowledge, and provided a path for risk-aware reinforcement learning. Insights: Building KGQA systems needs to emphasize the quality of reasoning paths; introducing process evaluation mechanisms in high-risk fields can improve credibility and practicality.