Zing Forum

Reading

GPS Framework: A Graph-Guided Approach to Enabling Large Language Models to Ask Proactively

GPS (Graph-guided Proactive Information Seeking) is an innovative training framework that uses graph structures to guide large language models to proactively seek information, addressing the information gap problem in complex question-answering tasks.

GPS主动信息寻求大语言模型图引导强化学习DAPOCondQA问答系统ICLR 2026
Published 2026-04-26 15:34Recent activity 2026-04-26 15:51Estimated read 6 min
GPS Framework: A Graph-Guided Approach to Enabling Large Language Models to Ask Proactively
1

Section 01

[Introduction] GPS Framework: A Graph-Guided Approach to Enabling Large Models to Ask Proactively

GPS (Graph-guided Proactive Information Seeking) is an innovative training framework proposed by research teams from institutions including Peking University. It uses graph structures to guide large language models to proactively seek information, addressing the information gap problem problem in complex question-answering tasks. This work has been accepted by ICLR 2026. Its core lies in using graphs to model information dependency relationships and reinforcement learning to optimize proactive information-seeking strategies, enabling models to shift from passive answering to active collaboration, thereby improving the accuracy and reliability of question-answering in complex scenarios.

2

Section 02

Background: The Challenge of 'Information Blind Spots' in Large Models

Current large language models assume they have all the information needed to complete tasks, but in real-world scenarios, they often give wrong answers due to incomplete information—this risk is particularly significant in high-stakes fields like medical diagnosis and legal consultation. How to enable models to learn to ask proactively, identify knowledge boundaries, and seek necessary information has become a cutting-edge direction in AI research.

3

Section 03

Core Idea of the GPS Framework

The core insight of GPS is to use graph structures to model the information dependency relationships of complex problems, guiding models to learn when, whom, and what to ask. It breaks the traditional one-time information acquisition model and introduces an iterative information-seeking loop: the model needs to determine whether it has sufficient information; if not, it clarifies the required information and the object to ask, realizing the transformation from a passive knowledge base to an active collaborator.

4

Section 04

Technical Implementation: Combining Graph Structures and Reinforcement Learning

Graph-Guided Information Dependency Modeling

Model the information dependency relationships of the problem as a directed graph, where nodes represent information units (entities, attributes, relationships) and edges represent dependency relationships, providing a clear blueprint of information needs and interpretable intermediate representations.

Reinforcement Learning Training Based on DAPO

The DAPO (Direct Preference Optimization for Active Perception) algorithm is used, with distributed training based on the verl framework to optimize information-seeking efficiency (obtaining key information with the fewest number of queries). The training process includes data preprocessing (converting CondQA to parquet), strategy training, and multi-test set evaluation (DAG, conditional QA, SHARC) to ensure generalization.

5

Section 05

Experimental Results and Performance

In the CondQA benchmark test, GPS outperforms traditional end-to-end models, with a significant improvement in accuracy in proactive information-seeking scenarios, balancing 'sufficient' and 'redundant' information. The open-source implementation is based on Qwen2.5-7B-Instruct, trained with 4 GPUs, using vLLM for inference acceleration. The hardware requirements are moderate, and the training scripts provide rich configuration options.

6

Section 06

Application Scenarios and Future Outlook

GPS can be applied to scenarios such as customer service (intelligent chatbots), medical consultation (assisting in collecting patient information), and educational tutoring (identifying knowledge blind spots). In the future, large models will evolve into dynamic interactive agents that proactively explore and learn. The code has been open-sourced based on the verl framework, providing convenience for community research and laying a technical foundation for active information-seeking in next-generation AI systems.