Zing Forum

Reading

ChiGNN: A New Method for Protein Side Chain Conformation Prediction Based on Diffusion Models

ChiGNN is a generative AI model based on torsional diffusion that uses the Von Mises distribution to model the dihedral angles of protein side chains in the circular space S¹, providing a lightweight and uncertainty-calibrated new solution for the field of protein structure prediction.

蛋白质结构预测扭转扩散图神经网络侧链构象Von Mises分布不确定性量化生物信息学生成式AI
Published 2026-05-16 20:55Recent activity 2026-05-16 20:59Estimated read 6 min
ChiGNN: A New Method for Protein Side Chain Conformation Prediction Based on Diffusion Models
1

Section 01

ChiGNN: Introduction to the New Method for Protein Side Chain Conformation Prediction Based on Diffusion Models

ChiGNN is a generative AI model based on torsional diffusion. Its core innovation lies in using the Von Mises distribution to model the dihedral angles of protein side chains in the circular space S¹, addressing the limitation of existing methods that cannot capture conformational distributions. It is lightweight (with only over 80,000 parameters) and has uncertainty calibration capabilities, providing a new solution for the field of protein structure prediction.

2

Section 02

Background and Challenges

In protein structure prediction, side chain dihedral angles determine functional properties (active sites, ligand binding, hydrogen bond networks). However, existing classical methods (SCWRL4, Rosetta) are deterministic optimizations and cannot reflect dynamic conformational distributions; although AlphaFold2 has made breakthroughs in backbone prediction, side chain conformation remains a challenge.

3

Section 03

Technical Methods and Architecture

Core Innovations

  • Torsional Diffusion in S¹ Space: Uses the Von Mises noise distribution (adapted to the periodicity of circular data) instead of traditional Gaussian diffusion.
  • Lightweight GNN: 4 layers of GCNConv + batch normalization + residual connections, with a total of 80,404 parameters, which can run efficiently on Colab T4 GPU.
  • Circular DDIM Sampling: The reverse process is adapted to the S¹ space to ensure valid angles.

Data Flow

The input is a protein graph (nodes: residue information; edges: connected if Cα distance <8Å). Forward diffusion adds Von Mises noise, and the reverse process recovers the original angles by predicting gradients via GNN.

4

Section 04

Datasets and Experimental Evidence

Datasets

597 high-resolution (<2.0Å) structures were selected from PDB-REDO, with training/validation/test split at the protein level as 80%/10%/10%.

Training

AdamW optimizer + cosine annealing, trained for 50 epochs, with the best performance at epoch 42 (validation loss: 0.0885).

Results

  • χ₁ circular MAE: 56.41° (better than the modal baseline of ~60-65°)
  • χ₁ rotamer recovery rate within ±40° reaches 53.9% (better than the baseline of ~47%)
  • Rose plots reproduce the trimodal distribution of side chain angles, demonstrating the learning of biophysical constraints.
5

Section 05

Conclusions and Unique Value

The uncertainty quantification capability of ChiGNN is its core value: the prediction variance is significantly correlated with the actual error (Spearman ρ=0.299, p<0.001), which can be used for reliability assessment, experimental guidance, and iterative optimization. Compared to deterministic methods (SCWRL4/Rosetta) that lack self-diagnosis capabilities, and AlphaFold whose uncertainty reflects data coverage, ChiGNN is more targeted. Application prospects include drug design (rational allocation of experimental resources), protein engineering (understanding mutation effects), and open-source implementation lowers the barrier to use.

6

Section 06

Limitations and Future Recommendations

Limitations

  • Small dataset size (597 proteins) limits generalization;
  • GCNConv is non-equivariant and sensitive to protein orientation;
  • Models dihedral angles independently without capturing the joint distribution.

Future Directions

Expand the dataset to over 10,000, adopt EGNN equivariant architecture, multivariate diffusion modeling, CASP15 benchmark validation, and combine with lightweight force field refinement.