Reading

GraphSSR: Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning with Large Language Models

GraphSSR, an ACM SIGKDD 2026 accepted paper, has an open-source implementation. It achieves adaptive subgraph sampling and denoising via two-stage reinforcement learning, addressing the noise sensitivity issue of large language models in graph learning.

GraphSSR图学习大语言模型子图去噪零样本学习强化学习图神经网络ACM SIGKDD自适应采样知识图谱

Published 2026-05-31 17:45Recent activity 2026-05-31 17:49Estimated read 7 min

GraphSSR: Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning with Large Language Models

Section 01

GraphSSR: Guide to the LLM Adaptive Subgraph Denoising Framework for Zero-Shot Graph Learning

GraphSSR is a paper accepted by ACM SIGKDD 2026 and has an open-source implementation. This framework achieves adaptive subgraph sampling and denoising through two-stage reinforcement learning, addressing the noise sensitivity problem of large language models (LLMs) in graph learning, especially suitable for zero-shot graph learning scenarios. The original author is mysteriouslfz, and the project is hosted on GitHub (link: https://github.com/mysteriouslfz/GraphSSR), released on 2026-05-31.

Section 02

Research Background and Challenges

The combination of Graph Neural Networks (GNNs) and Large Language Models (LLMs) is an important direction for processing graph-structured data. However, real-world graph data often contains a large number of noisy nodes and edges, which seriously affects model inference performance. Traditional fixed-size subgraph sampling strategies cannot adapt to different problem complexities (simple problems require a small number of nodes, while complex problems need larger contexts). In zero-shot graph learning scenarios, models need to reason on unseen graph data, which places higher demands on the accuracy of subgraph sampling and denoising capabilities. How to dynamically adjust the sampling range and filter noise is a key challenge at present.

Section 03

Core Idea of GraphSSR

GraphSSR (Adaptive Subgraph Denoising via Sample-Select-Reason) proposes a new adaptive subgraph denoising paradigm. The core insight is that problems of different difficulty levels require subgraphs of different sizes—oversized subgraphs tend to contain more noise. The model adopts a three-stage process of 'Sample-Select-Reason': first sample candidate subgraphs, then evaluate and select the optimal subgraph, and finally reason based on the selected subgraph, explicitly balancing subgraph completeness and purity.

Section 04

Technical Architecture and Training Process

GraphSSR training is divided into two stages: Supervised Fine-Tuning (SSR-SFT) and Reinforcement Learning (SSR-RL):

SSR-SFT Stage: The goal is to master basic subgraph reasoning capabilities. Training samples are constructed using the GraphR1 dataset. The teacher model generates high-quality reasoning trajectories (filtered by answer correctness and structural diversity). Distributed training is performed using the LlamaFactory framework, and vLLM is used to deploy the teacher model and diversity evaluation model.
SSR-RL Stage: Using the verl framework, it is divided into two sub-stages:
- Truthfulness Reinforcement Learning: The reward function R1 enforces subgraph truthfulness, selection consistency, and answer correctness.
- Denoising Reinforcement Learning: The reward function R2 adds a subgraph size reward to R1 (the smaller the subgraph when the answer is correct, the higher the reward), encouraging the selection of more concise and pure subgraphs.

Section 05

Experiment and Evaluation Results

GraphSSR was evaluated on the GOFA benchmark dataset (covering multi-domain graph data such as academic paper citations, product classification, historical events, medical literature, and knowledge graphs). The results show that its performance in zero-shot graph learning tasks is significantly improved. It can maintain high accuracy while significantly reducing the size of subgraphs required for reasoning, lowering computational overhead and improving inference efficiency.

Section 06

Open Source and Reproducibility Support

The project provides complete datasets (GraphR1 training data, GOFA test data, and pre-generated SFT/RL training data) and pre-trained models, hosted on the Hugging Face platform. The code repository includes full-process instructions for environment configuration, data preparation, model training, and evaluation. It supports rapid deployment of training environments using Docker containers and provides scripts for automated cluster management and model service deployment.

Section 07

Practical Significance and Outlook

GraphSSR realizes a technical paradigm shift from fixed subgraph sampling to adaptive subgraph selection, improving the robustness of models on noisy graph data and providing new ideas for the deep integration of LLMs and structured data. In practical applications, its adaptive characteristics are suitable for processing large-scale, high-noise real graph data (such as social network analysis, knowledge graph question answering, recommendation systems, etc.). In the future, as LLM capabilities improve, this method is expected to show value in more fields.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15