Zing Forum

Reading

Exploring the Function Approximation Capability of Neural Networks: From the Universal Approximation Theorem to Practical Verification

An in-depth analysis of the theoretical foundation of the Universal Approximation Theorem, exploring the mechanism by which single-hidden-layer neural networks approximate arbitrary continuous functions, and how to balance network complexity and approximation accuracy in practical applications.

通用逼近定理Universal Approximation Theorem神经网络函数逼近深度学习理论激活函数ReLUSigmoid机器学习数学基础
Published 2026-05-01 09:41Recent activity 2026-05-01 10:14Estimated read 6 min
Exploring the Function Approximation Capability of Neural Networks: From the Universal Approximation Theorem to Practical Verification
1

Section 01

【Main Post/Introduction】Exploring the Function Approximation Capability of Neural Networks: From the Universal Approximation Theorem to Practical Verification

This article focuses on the function approximation capability of neural networks, with a core discussion on the theoretical foundation of the Universal Approximation Theorem and its application in practice. The theorem provides mathematical guarantee for the strong expressive power of neural networks (a single hidden layer with sufficient neurons can approximate continuous functions on compact sets), but there is a gap between theory and practice (architecture selection, optimization, generalization); experiments have verified the theorem and demonstrated the impact of activation functions. In practical applications, it is necessary to balance complexity and generalization. This theory is related to other machine learning theories and serves as a bridge connecting theory and practice.

2

Section 02

Background: Core and Intuitive Understanding of the Universal Approximation Theorem

The Mystery of Neural Network Expressive Power

The theoretical foundation for neural networks to solve a wide range of tasks can be traced back to the Universal Approximation Theorem.

Core Content of the Theorem

A single-hidden-layer feedforward neural network, with enough neurons in the hidden layer and an activation function that meets conditions (such as Sigmoid or ReLU), can approximate continuous functions on compact sets with arbitrary precision. Key points: A single hidden layer is sufficient; no specific number of neurons is specified; applicable to continuous functions on compact sets.

Intuitive Understanding

Neural networks are like flexible function construction tools; neurons combine local simple nonlinear transformations to achieve global complex function mapping (a divide-and-conquer strategy).

3

Section 03

The Gap from Theory to Practice: Challenges from Existence to Effective Approximation

The Universal Approximation Theorem is an existence theorem and does not solve the problem of how to approximate effectively:

  1. Architecture Selection: Theoretically, a single hidden layer is sufficient, but deep networks are better in practice (hierarchical feature extraction);
  2. Optimization Problem: The loss function is non-convex, with local optima and saddle points;
  3. Generalization Problem: The theorem focuses on approximation capability, while machine learning is more concerned with generalization to new data (a category of statistical learning theory).
4

Section 04

Evidence: Experimental Verification and the Impact of Activation Functions

Experimental verification from open-source projects:

  • As the number of neurons in the hidden layer increases, the approximation error decreases (verifying the core of the theorem);
  • The rate of error reduction is uneven, and some regions require more neurons;
  • Differences in activation functions: ReLU converges faster, but performs poorly on certain functions, so selection should be based on the problem.
5

Section 05

Practical Application Considerations: Balancing Model Complexity and Generalization

  1. Source of Confidence: When data has learnable patterns, neural networks can theoretically discover them;
  2. Overfitting Risk: They can approximate noise, so regularization, early stopping, and data augmentation are needed;
  3. Complexity Trade-off: Increasing neurons improves approximation capability, but increases computational cost and overfitting risk, so a suitable scale must be found.
6

Section 06

Conclusions and Recommendations: Practical Directions Guided by Theory

  • The Universal Approximation Theorem is a bridge between theory and practice, explaining the power of neural networks and reminding practitioners to consider multiple issues;
  • Experiments make abstract theories concrete; it is recommended that learners build an intuitive understanding through experiments;
  • Significance of reviewing basic theories: Understand the boundaries of existing technologies and guide future model design.