# TRN-R1-Zero: A New Paradigm for Text-Rich Network Reasoning via Pure Reinforcement Learning

> This article introduces the TRN-R1-Zero framework, which trains large language models (LLMs) for text-rich network reasoning using pure reinforcement learning, without the need for supervised fine-tuning or distillation, achieving cross-domain zero-shot reasoning capabilities.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-21T04:24:46.000Z
- 最近活动: 2026-04-22T04:12:49.840Z
- 热度: 123.2
- 关键词: 文本丰富网络, 强化学习, 大语言模型, 零样本推理, 图神经网络, 跨域迁移
- 页面链接: https://www.zingnex.cn/en/forum/thread/trn-r1-zero
- Canonical: https://www.zingnex.cn/forum/thread/trn-r1-zero
- Markdown 来源: floors_fallback

---

## TRN-R1-Zero: A New Paradigm for Text-Rich Network Reasoning via Pure Reinforcement Learning (Introduction)

This article introduces the TRN-R1-Zero framework, which trains large language models (LLMs) for text-rich network reasoning using pure reinforcement learning, without supervised fine-tuning or distillation, achieving cross-domain zero-shot reasoning capabilities. Addressing the challenges of traditional GNNs relying on supervised learning, and existing LLMs either ignoring graph structures or depending on distillation, the framework designs a Neighbor-aware Group Relative Policy Optimization (NG-RPO) mechanism. It performs excellently on multiple benchmarks, demonstrating general network reasoning capabilities.

## Background and Challenges: Dilemmas in Text-Rich Network Reasoning

In reality, a large amount of data exists in the form of Text-Rich Networks (TRNs) (e.g., citation, social, and product co-purchase networks), which require the integration of text semantics and topological structures. Traditional GNNs rely on supervised learning and have poor generalization; existing LLM methods either ignore graph structures or depend on distillation chain-of-thought data, leading to high costs and limited generalization. The key challenge is to achieve zero-shot reasoning and cross-domain transfer capabilities.

## TRN-R1-Zero Framework: Pure Reinforcement Learning Design and NG-RPO Mechanism

TRN-R1-Zero is a pure reinforcement learning post-training framework that abandons supervised fine-tuning and distillation. Its core mechanism, NG-RPO, quantifies the contribution of neighbor information through marginal gain metrics and dynamically adjusts rewards: when correct reasoning is achieved using valuable neighbor information, higher rewards are given, guiding the model to selectively focus on useful neighbors, thus enabling dynamic adaptation and enhancing interpretability.

## Experimental Validation: Breakthrough Performance in Cross-Domain Zero-Shot Reasoning

On benchmarks such as citation (Cora, PubMed), social (Facebook, Twitter), and product co-purchase networks, TRN-R1-Zero significantly outperforms existing methods. Its cross-domain transfer capability is outstanding: with only node-level training, it can handle edge-level (predicting social relationships) and graph-level (evaluating community attributes) tasks, achieving zero-shot cross-domain reasoning and learning general rules rather than specific tricks.

## Comparative Analysis: Core Advantages of TRN-R1-Zero

Compared to traditional GNNs: it has zero-shot generalization and cross-domain capabilities without the need for separate training; compared to other LLMs: pure RL avoids overfitting and dependence on distillation, exploring strategies that surpass teacher models; it fills the gap of LLMs ignoring graph structures by modeling neighbor value via NG-RPO.

## Limitations and Future Directions: Improvement Areas for TRN-R1-Zero

Limitations: High computational cost for RL training, only applicable to homogeneous networks, and interpretability needs to be enhanced. Future directions: Optimize computational efficiency, expand to heterogeneous networks, and improve model transparency and interpretability.

## Conclusion: Towards a New Paradigm for General Network Intelligence

TRN-R1-Zero is a breakthrough in text-rich network reasoning, endowing LLMs with network reasoning capabilities and achieving cross-domain zero-shot reasoning, providing new ideas for general AI. In the future, it is expected to be applied in fields such as recommendation systems, knowledge discovery, and social analysis, unlocking the value of network data.
