Zing Forum

Reading

Performance Benchmarking Study of PyTorch 2.0's torch.compile() on Graph Neural Networks

This article deeply analyzes the actual performance of PyTorch 2.0's torch.compile() feature on Graph Neural Networks (GNNs), comparing the inference speed, memory usage, and training efficiency of the two major frameworks PyTorch Geometric and Deep Graph Library under different compilation modes, providing GNN developers with a systematic reference for performance optimization.

PyTorchtorch.compileGraph Neural NetworksGNNPyTorch GeometricDeep Graph Library性能优化基准测试深度学习编译器
Published 2026-05-19 21:13Recent activity 2026-05-19 21:19Estimated read 8 min
Performance Benchmarking Study of PyTorch 2.0's torch.compile() on Graph Neural Networks
1

Section 01

[Introduction] Performance Benchmarking Study of PyTorch 2.0's torch.compile() on GNNs

This article conducts a systematic study on the performance of PyTorch 2.0's torch.compile() feature on Graph Neural Networks (GNNs), comparing the inference speed, memory usage, and training efficiency of the two major frameworks PyTorch Geometric (PyG) and Deep Graph Library (DGL) under different compilation modes. The study is based on the benchmark framework of the gnn-compile-bench project, aiming to provide GNN developers with a reference for performance optimization.

2

Section 02

Research Background and Motivation

PyTorch 2.0's torch.compile() improves model efficiency through graph compilation technology, but GNNs have inherent differences from traditional neural networks due to their sparse graph structures and message-passing mechanisms, raising questions about whether its optimization strategies are applicable to GNN scenarios. The gnn-compile-bench project developed by Sonia Vetter aims to answer these questions by conducting in-depth performance evaluations of the PyG and DGL frameworks under various compilation configurations.

3

Section 03

Testing Framework and Key Technical Implementations

Testing Framework and Methodology

  • Supported GNN Models: Four mainstream architectures: GCN, GraphSAGE, GAT, GIN.
  • Datasets: Covers node classification (Cora, PubMed, ogbn-arxiv, etc.), link prediction (ogbl-collab), and knowledge graph completion (ogbl-biokg). The ogbn-products dataset requires mini-batch sampling.
  • Compilation Modes: Compares five modes: Eager (default), Default, Reduce Overhead, Max Autotune, Max Autotune No Cudagraphs. It also studies the impact of the dynamic parameter (auto/True/False) on dynamic shape handling.

Key Technical Details

  • Dynamic Shape Handling: Three strategies (auto/True/False) to handle the dynamic nature of GNN inputs.
  • Fixes: Fixes for GCN self-loop addition and normalization calculations to ensure test accuracy.
  • Automation Scripts: run_experiments.sh (full experiments), run_gin_accuracy.sh (GIN accuracy test), run_products.sh (large-scale dataset test).
4

Section 04

Test Results and Key Findings

Overall Performance

508 out of 580 experiments succeeded, with OOM errors being the main failure reason, indicating that torch.compile() has good stability in GNN scenarios.

Inference Latency

Most compilation modes reduce inference latency:

  • PyG: The Max Autotune mode has a significant acceleration effect.
  • DGL: The eager mode is already optimized, but compilation still provides additional benefits.

Memory Usage

  • Compilation modes may increase memory overhead; a trade-off between performance and memory constraints is needed.
  • DGL has better memory efficiency; the Max Autotune mode has the highest memory usage.

Framework Comparison

  • DGL's eager mode outperforms PyG's compiled modes in some scenarios.
  • PyG combined with torch.compile() has great potential for performance improvement, making it suitable for scenarios pursuing extreme performance.
5

Section 05

Practical Application Value

Guidance for Developers

  1. Optimization Path: Clarify the effect of torch.compile() in GNN scenarios to help decide whether to invest in compilation optimization.
  2. Framework Selection: Choose PyG or DGL based on performance requirements.
  3. Deployment Strategy: Formulate production deployment plans based on the trade-off between memory and latency.

Contribution to the PyTorch Ecosystem

Verify the effectiveness of torch.compile() in GNN scenarios, help the PyTorch team identify optimization opportunities, and promote improvements in compilation technology.

6

Section 06

Reproducibility and Extensibility

Reproducibility

  • Environment Configuration: Provides a conda environment file (environment.yml), a pip package list (pip_packages.txt), and a snapshot of system information.
  • Analysis Tools: The analyze_results_v12.py script organizes results into an Excel workbook with over 20 analysis dimensions.

Extensibility

The framework supports adding new GNN models (e.g., Transformer-based GNNs), datasets, compilation modes, and optimization strategies.

7

Section 07

Limitations and Future Directions

Limitations

  • The tests focus on node classification and link prediction, with limited coverage of graph-level tasks.
  • Experimental results are affected by hardware configurations (GPU model, CUDA version).
  • Some large-scale dataset tests are limited by memory.

Future Directions

  • Explore the effect of torch.compile() on Heterogeneous Graph Neural Networks (HGNNs).
  • Study compilation optimization strategies for dynamic graph scenarios.
  • Combine Mixed Precision Training (AMP) to further improve performance.