# Building a Feedforward Neural Network from Scratch: A Deep Learning Practice for Protein Folding State Classification

> Implement a complete feedforward neural network from scratch using only NumPy, perform three-class classification (folded/intermediate/unfolded) on molecular dynamics simulation data of the Trp-cage mini-protein, and gain an in-depth understanding of the mathematical principles behind neural networks.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-15T04:13:00.000Z
- 最近活动: 2026-06-15T04:31:31.115Z
- 热度: 154.7
- 关键词: 前馈神经网络, 蛋白质折叠, 分子动力学, NumPy, 从零实现, 深度学习, Trp-cage, RMSD, ETE, 生物信息学
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-ptan123-ffnn-project
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-ptan123-ffnn-project
- Markdown 来源: floors_fallback

---

## Introduction: Building a Feedforward Neural Network from Scratch for Protein Folding State Classification

This project is developed and maintained by ptan123, released on GitHub (Project title: FFNN_Project, Link: https://github.com/ptan123/FFNN_Project, Release date: June 15, 2026). The core content is to implement a feedforward neural network from scratch using only NumPy, perform three-class classification (folded, intermediate, unfolded states) on molecular dynamics simulation data of the Trp-cage mini-protein, aiming to gain an in-depth understanding of the mathematical principles behind neural networks.

## Project Background and Scientific Significance

Protein folding is a core problem in biochemistry; its three-dimensional structure determines function, stability, and interactions, which is of great significance for drug design and disease mechanism research. Trp-cage (composed of 20 amino acids) is an ideal model for folding research, as it can generate large amounts of conformational data via molecular dynamics simulations, but classifying different conformational states poses challenges. Characteristics of the three states: folded state (low RMSD, low ETE, functional state), intermediate state (low ETE, high RMSD, non-functional), unfolded state (high RMSD, high ETE, non-functional).

## Technical Objectives and Dataset Description

This project is a practice for the CH610 machine learning course, with the goal of building a fully functional feedforward neural network from scratch (using only NumPy, no reliance on advanced frameworks). Reasons for choosing to implement from scratch: modern frameworks encapsulate underlying principles, and implementing from scratch allows for an in-depth understanding of core components such as forward propagation, activation functions, backpropagation, loss functions, and optimization algorithms. The dataset is Trp-cage simulation data, with features including RMSD (measures the deviation of a conformation from the reference structure) and ETE (end-to-end distance, reflects compactness).

## Neural Network Architecture and Core Algorithms

Architecture design: The input layer receives 2-dimensional features (RMSD, ETE); the hidden layer uses the ReLU activation function (max(0,x)); the output layer uses softmax for three-class classification. Core algorithms: Forward propagation (input → hidden layer → output layer → softmax); loss function is cross-entropy (measures the difference between predicted and true distributions); labels use one-hot encoding; backpropagation calculates gradients via the chain rule; gradient descent is used to update parameters (weights, biases).

## Model Evaluation and Implementation Trade-offs

Evaluation methods: Test accuracy (basic metric), learning curve (judge convergence/overfitting), confusion matrix (identify class confusion), decision boundary visualization (intuitively display classification rules). Advantages of NumPy implementation: High transparency, great educational value, strong flexibility, lightweight; Challenges: No GPU acceleration, lack of advanced features (e.g., batch normalization), high debugging difficulty, no production environment functions (model saving/loading).

## Scientific Value and Extension Directions

Scientific value: Demonstrates the cross-integration of machine learning and biochemistry, provides tools for analyzing large-scale molecular simulation data, and verifies model rationality based on physical principles (physical meaning of RMSD/ETE). Extension directions: Introduce more features (radius of gyration, contact map), try complex architectures (CNN/RNN), expand to larger protein systems, implement uncertainty quantification (Bayesian neural networks), apply active learning (intelligent sampling).

## Summary and Key Insights

This project is an excellent teaching case that reflects the value of interdisciplinary research. The key takeaway is not the classification accuracy, but the in-depth understanding of neural network principles, which is the foundation for the rational application of machine learning in scientific research. The open-source implementation provides a reference for learners; although mature frameworks are needed in production environments, mastering the underlying principles is a necessary path to becoming an excellent machine learning practitioner.