Zing Forum

Reading

Comparative Study of Neural Network Architectures for Surface Crack Detection: Performance Evolution from FFNN to Transfer Learning

An in-depth analysis of a deep learning research project on surface crack detection, comparing the performance of four architectures—FFNN, LSTM-RNN, CNN, and ResNet18 transfer learning—on a dataset of approximately 228,000 grayscale images, revealing the strengths and weaknesses of different neural network architectures in industrial visual inspection tasks.

表面裂缝检测计算机视觉深度学习CNNResNet迁移学习LSTM神经网络对比工业质检PyTorch
Published 2026-05-11 08:25Recent activity 2026-05-11 10:17Estimated read 7 min
Comparative Study of Neural Network Architectures for Surface Crack Detection: Performance Evolution from FFNN to Transfer Learning
1

Section 01

[Introduction] Core Summary of Comparative Study on Neural Network Architectures for Surface Crack Detection

This study addresses the problem of surface crack recognition in industrial visual inspection, comparing the performance of four neural network architectures: FFNN, LSTM-RNN, CNN, and ResNet18 transfer learning. Based on a dataset of approximately 228,000 grayscale images, it reveals the performance evolution trajectory from basic to advanced models, with the ResNet18 transfer learning model achieving the best performance (86% accuracy after tuning). This article will cover research background, methods, results, applications, and other content in separate floors.

2

Section 02

Research Background: Practical Challenges and Needs of Industrial Visual Inspection

Surface crack detection is a key link in industrial quality control, widely used in building safety assessment, manufacturing quality inspection, and other fields. Traditional manual detection is inefficient and prone to subjective influence, making it difficult to meet the precision and speed requirements of modern industry. With the development of deep learning, automated detection has become possible, but several questions need to be answered: Can simple fully connected networks be competent? Are recurrent neural networks suitable for image data? What are the advantages of convolutional networks? How much improvement can transfer learning bring? These questions are the core of this study.

3

Section 03

Dataset and Preprocessing Pipeline

The study uses Kaggle's "Cracked and Non-Cracked Surface Datasets", which contains approximately 228,000 grayscale images (balanced dataset). The preprocessing pipeline includes: 1. Establishing a data warehouse to record image paths; 2. Data visualization analysis; 3. Uniformly resizing to 227×227 pixels; 4. Achieving 3x data augmentation through horizontal flipping and color jitter; 5. Undersampling the majority class to balance categories. The dataset is divided into training, validation, and test sets in an 80%/10%/10% ratio, with a random seed of 42 to ensure reproducibility.

4

Section 04

Comparison of Four Neural Network Architectures

The study compares four architectures:

  1. FFNN: A baseline model that flattens images into one-dimensional vectors and inputs them into fully connected layers. After tuning, its accuracy is 74%, with limited performance due to the lack of spatial modeling capability.
  2. LSTM-RNN: Treats images as pixel sequences and uses LSTM to capture temporal dependencies. However, its accuracy after tuning is still 73%, which is not better than FFNN (crack features are local spatial patterns rather than global sequence dependencies).
  3. CNN: A standard architecture for computer vision that extracts local features through convolutional layers. After tuning, its accuracy is 80%, reflecting the advantages of convolutional operations.
  4. ResNet18 Transfer Learning: Based on ImageNet pre-trained weights, the first layer is modified to adapt to single-channel input. After tuning, its accuracy is 86%, and pre-trained knowledge improves the ability to recognize fine cracks.
5

Section 05

Performance Results and Key Findings

Comprehensive Performance Ranking (after tuning): ResNet18 (86%) > CNN (80%) > LSTM-RNN (73%) = FFNN (74%). Key Findings: Among all architectures trained from scratch, recognizing crack classes is more difficult than non-crack classes (fine cracks are hard to distinguish from normal textures), and models tend to predict non-cracks; transfer learning significantly improves the recall rate of crack classes (81%). In hyperparameter tuning, FFNN benefited the most (70%→74%), CNN and LSTM had limited improvements, and transfer learning had a moderate improvement (84%→86%).

6

Section 06

Practical Application Value and Model Selection Guide

Application Scenarios: Auxiliary tools for automated quality inspection in manufacturing (improving efficiency), infrastructure safety monitoring (initial screening of crack images). Model Selection: Choose CNN when resources are limited (80% accuracy, high cost-effectiveness); choose ResNet18 transfer learning for optimal performance (requires GPU support); FFNN and LSTM are not recommended for production use.

7

Section 07

Limitations and Future Research Directions

Current Limitations: Single dataset, fine crack recognition still challenging, modern architectures like Vision Transformer not explored, lack of cross-dataset generalization testing. Future Directions: Introduce attention mechanisms to focus on crack regions, multi-scale feature fusion to capture cracks of different sizes, combine semantic segmentation for pixel-level localization, domain adaptation to improve cross-scenario generalization ability.