# GraphSSR: Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning with Large Language Models

> GraphSSR, an ACM SIGKDD 2026 accepted paper, has an open-source implementation. It achieves adaptive subgraph sampling and denoising via two-stage reinforcement learning, addressing the noise sensitivity issue of large language models in graph learning.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-31T09:45:45.000Z
- 最近活动: 2026-05-31T09:49:19.625Z
- 热度: 154.9
- 关键词: GraphSSR, 图学习, 大语言模型, 子图去噪, 零样本学习, 强化学习, 图神经网络, ACM SIGKDD, 自适应采样, 知识图谱
- 页面链接: https://www.zingnex.cn/en/forum/thread/graphssr-4920d8f6
- Canonical: https://www.zingnex.cn/forum/thread/graphssr-4920d8f6
- Markdown 来源: floors_fallback

---

## GraphSSR: Guide to the LLM Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning

GraphSSR is a paper accepted by ACM SIGKDD 2026 and has an open-source implementation. This framework achieves adaptive subgraph sampling and denoising through two-stage reinforcement learning, addressing the noise sensitivity problem of large language models (LLMs) in graph learning, especially suitable for zero-shot graph learning scenarios. The original author is mysteriouslfz, and the project is hosted on GitHub (link: https://github.com/mysteriouslfz/GraphSSR), released on 2026-05-31.

## Research Background and Challenges

The combination of Graph Neural Networks (GNNs) and Large Language Models (LLMs) is an important direction for processing graph-structured data. However, real-world graph data often contains a large number of noisy nodes and edges, which seriously affects model inference performance. Traditional fixed-size subgraph sampling strategies cannot adapt to different problem complexities (simple problems require a small number of nodes, while complex problems need larger contexts). In zero-shot graph learning scenarios, models need to reason on unseen graph data, which places higher demands on the accuracy of subgraph sampling and denoising capabilities. How to dynamically adjust the sampling range and filter noise is a key challenge at present.

## Core Idea of GraphSSR

GraphSSR (Adaptive Subgraph Denoising via Sample-Select-Reason) proposes a new adaptive subgraph denoising paradigm. The core insight is that problems of different difficulty levels require subgraphs of different sizes—oversized subgraphs tend to contain more noise. The model adopts a three-stage process of 'Sample-Select-Reason': first sample candidate subgraphs, then evaluate and select the optimal subgraph, and finally reason based on the selected subgraph, explicitly balancing subgraph completeness and purity.

## Technical Architecture and Training Process

GraphSSR training is divided into two stages: Supervised Fine-Tuning (SSR-SFT) and Reinforcement Learning (SSR-RL):
1. **SSR-SFT Stage**: The goal is to master basic subgraph reasoning capabilities. Training samples are constructed using the GraphR1 dataset. The teacher model generates high-quality reasoning trajectories (filtered by answer correctness and structural diversity). Distributed training is performed using the LlamaFactory framework, and vLLM is used to deploy the teacher model and diversity evaluation model.
2. **SSR-RL Stage**: Using the verl framework, it is divided into two sub-stages:
   - Truthfulness Reinforcement Learning: The reward function R1 enforces subgraph truthfulness, selection consistency, and answer correctness.
   - Denoising Reinforcement Learning: The reward function R2 adds a subgraph size reward to R1 (the smaller the subgraph when the answer is correct, the higher the reward), encouraging the selection of more concise and pure subgraphs.

## Experiment and Evaluation Results

GraphSSR was evaluated on the GOFA benchmark dataset (covering multi-domain graph data such as academic paper citations, product classification, historical events, medical literature, and knowledge graphs). The results show that its performance in zero-shot graph learning tasks is significantly improved. It can maintain high accuracy while significantly reducing the size of subgraphs required for reasoning, lowering computational overhead and improving inference efficiency.

## Open Source and Reproducibility Support

The project provides complete datasets (GraphR1 training data, GOFA test data, and pre-generated SFT/RL training data) and pre-trained models, hosted on the Hugging Face platform. The code repository includes full-process instructions for environment configuration, data preparation, model training, and evaluation. It supports rapid deployment of training environments using Docker containers and provides scripts for automated cluster management and model service deployment.

## Practical Significance and Outlook

GraphSSR realizes a technical paradigm shift from fixed subgraph sampling to adaptive subgraph selection, improving the robustness of models on noisy graph data and providing new ideas for the deep integration of LLMs and structured data. In practical applications, its adaptive characteristics are suitable for processing large-scale, high-noise real graph data (such as social network analysis, knowledge graph question answering, recommendation systems, etc.). In the future, as LLM capabilities improve, this method is expected to show value in more fields.
