# Application of Graph Neural Networks in Academic Paper Classification: Combining Text Mining and Citation Network Structure

> This article introduces a Graph Convolutional Network (GCN)-based academic paper classification system that innovatively combines text features and citation network structure, achieving an 82.50% classification accuracy on the Cora dataset.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-11T16:55:45.000Z
- 最近活动: 2026-05-11T16:59:34.442Z
- 热度: 159.9
- 关键词: 图神经网络, 文本挖掘, 学术论文分类, GCN, GAT, 引用网络, PyTorch, 机器学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-alzoubitoqa-graph-based-text-mining-for-research-paper-classification
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-alzoubitoqa-graph-based-text-mining-for-research-paper-classification
- Markdown 来源: floors_fallback

---

## Introduction: Graph Neural Networks Combine Text and Citation Networks to Improve Academic Paper Classification Performance

This article introduces a Graph Neural Network (GNN)-based academic paper classification system that innovatively integrates text features and citation network structure. By comparing GCN, GAT, and GATv2 models, and adopting optimization strategies such as early stopping and learning rate scheduling, the improved GAT model achieves an 82.50% classification accuracy on the Cora dataset, surpassing traditional isolated document classification methods.

## Research Background and Motivation

Automatic classification of academic papers is a core function of digital libraries and academic search engines. However, traditional methods (such as Naive Bayes, SVM, and deep learning text models) ignore the citation relationships between papers. Graph Neural Networks (GNNs) can directly process graph-structured data and aggregate neighbor information through message passing, providing a new approach for leveraging citation networks.

## Experimental Dataset: Cora Citation Network

This study uses the Cora citation network dataset, which features: 2708 papers, 1433-dimensional bag-of-words features, 7 academic categories, and 10556 citation edges. This dataset provides both text content and citation relationships, making it suitable for verifying the effectiveness of GNNs in academic classification.

## Methodology: Graph Representation Learning and Model Comparison

The citation network is modeled as an undirected graph (nodes = papers, edges = citations), using both text features and graph structure. The experiment compares three models:
1. GCN: Aggregates neighborhood information via spectral graph convolution;
2. GAT: Introduces an attention mechanism to dynamically assign neighbor weights;
3. GATv2: Improves GAT's dynamic attention to enhance expressive power.

## Experimental Design and Result Analysis

In baseline experiments, GCN achieved an accuracy of 80.50% and GAT 79.85%. After optimization with early stopping, learning rate scheduling, weight decay, etc., the improved GAT's accuracy increased to 82.50%. Multi-seed tests (seeds 42/7/123/2024/2026) for GATv2 showed good model stability, with a maximum accuracy of 82.10%.

## Key Findings and Insights

1. Citation relationships provide information that text alone cannot capture, such as domain knowledge and implicit associations;
2. GAT's attention weights can explain classification decisions and reveal important citation relationships;
3. Training strategies (early stopping, learning rate scheduling, etc.) are crucial to model performance.

## Application Scenarios and Extension Directions

Application scenarios include academic search engine optimization, research trend analysis, and personalized recommendation systems. Future improvement directions: using pre-trained language models instead of bag-of-words features, heterogeneous graph modeling (authors/institutions/journals), dynamic graph networks, and large-scale expansion.

## Research Conclusions

This study verifies the effectiveness of GNNs in academic paper classification. Core contributions include: method innovation integrating text and graph structure, systematic comparison of three GNN models, optimization practice of training strategies, and complete reproducible code implementation. The 82.50% accuracy of the improved GAT model proves the superiority of this method, providing technical support for intelligent academic information management systems.
