Zing Forum

Reading

ST-SNN: A New Method for Spatiotemporal Graph Convolution Action Recognition Based on Sheaf Neural Networks

This article introduces the ST-SNN architecture, which replaces the standard graph convolutional network in ST-GCN with a sheaf neural network. It effectively models heterogeneous interactions using orthogonal restriction maps, improving the baseline accuracy from 81.5% to 85.4% on the NTU RGB+D dataset, and achieves performance comparable to STGCN++ when combined with advanced temporal modules.

层束神经网络时空图卷积动作识别异质交互图神经网络ST-GCNSheaf Neural Networks骨骼数据PySKL
Published 2026-05-21 18:16Recent activity 2026-05-21 18:18Estimated read 5 min
ST-SNN: A New Method for Spatiotemporal Graph Convolution Action Recognition Based on Sheaf Neural Networks
1

Section 01

ST-SNN: Guide to a New Action Recognition Method Based on Sheaf Neural Networks

This article introduces the ST-SNN architecture, which replaces the standard graph convolutional network in ST-GCN with a sheaf neural network. It effectively models heterogeneous interactions using orthogonal restriction maps, improving the baseline accuracy from 81.5% to 85.4% on the NTU RGB+D dataset, and achieves performance comparable to STGCN++ when combined with advanced temporal modules.

2

Section 02

Research Background and Motivation

Human action recognition is one of the core tasks in computer vision, widely applied in scenarios like intelligent surveillance, human-computer interaction, and motion analysis. Action recognition methods based on skeleton data are robust to lighting changes, occlusions, and view variations. Traditional ST-GCN models spatial relationships of human joints via GCN, but it suffers from over-smoothing due to the homogeneity assumption, making it hard to capture heterogeneous interactions between adjacent joints (with completely different motion patterns).

3

Section 03

Core Idea and Architecture Design

Sheaf Neural Network (SheafNN) is based on sheaf theory, assigning an independent feature space to each node and defining inter-node interactions via restriction maps. The core innovation of ST-SNN is replacing the GCN adjacency matrix with a sheaf Laplacian operator using orthogonal restriction maps to avoid over-smoothing. The architecture includes a spatial module (SheafNN replacing GCN) and a temporal module (standard temporal module / MS-TCN multi-scale temporal convolution module).

4

Section 04

Experimental Results and Performance Analysis

Results on the ntu60_xsub_3d benchmark of the NTU RGB+D dataset:

Model Spatial Module Temporal Module Accuracy
ST-GCN (Baseline) GCN Standard 81.5%
ST-SNN SheafNN Standard 85.4%
STGCN++ GCN MS-TCN 89.4%
ST-SNN++ SheafNN MS-TCN ~89.0%
Key findings: Replacing the spatial module improves accuracy by 3.9 percentage points; combining with MS-TCN achieves performance comparable to STGCN++; computational efficiency is manageable.
5

Section 05

Technical Details and Implementation Key Points

Orthogonal restriction maps are critical: each edge learns an orthogonal transformation matrix to preserve the structural integrity of the feature space; ST-SNN is implemented as a PySKL plugin module, including topology modules, configuration files, and MMCV registry integration, which can be easily integrated into existing PySKL workflows.

6

Section 06

Application Prospects and Extension Directions

  1. Suitable for heterogeneous graph data such as social networks, molecular structures, and knowledge graphs; 2. Can explore multi-modal fusion (visual + skeleton features); 3. Mine the physical meaning of restriction maps to enhance model interpretability.
7

Section 07

Summary and Outlook

ST-SNN solves the over-smoothing problem of traditional ST-GCN using sheaf neural networks, significantly improving action recognition performance and demonstrating the potential of topological methods in deep learning. The project provides a complete PySKL integration solution, and we look forward to more innovative applications of sheaf neural networks.