Zing Forum

Reading

GraphSSR: Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning with Large Language Models

GraphSSR, an ACM SIGKDD 2026 accepted paper, has an open-source implementation. It achieves adaptive subgraph sampling and denoising via two-stage reinforcement learning, addressing the noise sensitivity issue of large language models in graph learning.

GraphSSR图学习大语言模型子图去噪零样本学习强化学习图神经网络ACM SIGKDD自适应采样知识图谱
Published 2026-05-31 17:45Recent activity 2026-05-31 17:49Estimated read 7 min
GraphSSR: Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning with Large Language Models
1

Section 01

GraphSSR: Guide to the LLM Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning

GraphSSR is a paper accepted by ACM SIGKDD 2026 and has an open-source implementation. This framework achieves adaptive subgraph sampling and denoising through two-stage reinforcement learning, addressing the noise sensitivity problem of large language models (LLMs) in graph learning, especially suitable for zero-shot graph learning scenarios. The original author is mysteriouslfz, and the project is hosted on GitHub (link: https://github.com/mysteriouslfz/GraphSSR), released on 2026-05-31.

2

Section 02

Research Background and Challenges

The combination of Graph Neural Networks (GNNs) and Large Language Models (LLMs) is an important direction for processing graph-structured data. However, real-world graph data often contains a large number of noisy nodes and edges, which seriously affects model inference performance. Traditional fixed-size subgraph sampling strategies cannot adapt to different problem complexities (simple problems require a small number of nodes, while complex problems need larger contexts). In zero-shot graph learning scenarios, models need to reason on unseen graph data, which places higher demands on the accuracy of subgraph sampling and denoising capabilities. How to dynamically adjust the sampling range and filter noise is a key challenge at present.

3

Section 03

Core Idea of GraphSSR

GraphSSR (Adaptive Subgraph Denoising via Sample-Select-Reason) proposes a new adaptive subgraph denoising paradigm. The core insight is that problems of different difficulty levels require subgraphs of different sizes—oversized subgraphs tend to contain more noise. The model adopts a three-stage process of 'Sample-Select-Reason': first sample candidate subgraphs, then evaluate and select the optimal subgraph, and finally reason based on the selected subgraph, explicitly balancing subgraph completeness and purity.

4

Section 04

Technical Architecture and Training Process

GraphSSR training is divided into two stages: Supervised Fine-Tuning (SSR-SFT) and Reinforcement Learning (SSR-RL):

  1. SSR-SFT Stage: The goal is to master basic subgraph reasoning capabilities. Training samples are constructed using the GraphR1 dataset. The teacher model generates high-quality reasoning trajectories (filtered by answer correctness and structural diversity). Distributed training is performed using the LlamaFactory framework, and vLLM is used to deploy the teacher model and diversity evaluation model.
  2. SSR-RL Stage: Using the verl framework, it is divided into two sub-stages:
    • Truthfulness Reinforcement Learning: The reward function R1 enforces subgraph truthfulness, selection consistency, and answer correctness.
    • Denoising Reinforcement Learning: The reward function R2 adds a subgraph size reward to R1 (the smaller the subgraph when the answer is correct, the higher the reward), encouraging the selection of more concise and pure subgraphs.
5

Section 05

Experiment and Evaluation Results

GraphSSR was evaluated on the GOFA benchmark dataset (covering multi-domain graph data such as academic paper citations, product classification, historical events, medical literature, and knowledge graphs). The results show that its performance in zero-shot graph learning tasks is significantly improved. It can maintain high accuracy while significantly reducing the size of subgraphs required for reasoning, lowering computational overhead and improving inference efficiency.

6

Section 06

Open Source and Reproducibility Support

The project provides complete datasets (GraphR1 training data, GOFA test data, and pre-generated SFT/RL training data) and pre-trained models, hosted on the Hugging Face platform. The code repository includes full-process instructions for environment configuration, data preparation, model training, and evaluation. It supports rapid deployment of training environments using Docker containers and provides scripts for automated cluster management and model service deployment.

7

Section 07

Practical Significance and Outlook

GraphSSR realizes a technical paradigm shift from fixed subgraph sampling to adaptive subgraph selection, improving the robustness of models on noisy graph data and providing new ideas for the deep integration of LLMs and structured data. In practical applications, its adaptive characteristics are suitable for processing large-scale, high-noise real graph data (such as social network analysis, knowledge graph question answering, recommendation systems, etc.). In the future, as LLM capabilities improve, this method is expected to show value in more fields.