Zing Forum

Reading

Butterfly Species Classification Based on Deep Learning: A Complete Computer Vision Project Practice

This article introduces the Butterfly-Image-Classification project, which uses TensorFlow and Convolutional Neural Networks (CNN) to classify images of 10 butterfly species. The project covers a complete machine learning workflow: automatic dataset download from Zenodo, image preprocessing, data augmentation, dual-model architecture training, evaluation, and visualization, and provides cross-platform precompiled executable files.

蝴蝶分类卷积神经网络计算机视觉TensorFlow图像分类数据增强批归一化DropoutCI/CD生物多样性
Published 2026-05-16 17:02Recent activity 2026-05-16 17:09Estimated read 8 min
Butterfly Species Classification Based on Deep Learning: A Complete Computer Vision Project Practice
1

Section 01

[Introduction] Complete Project Practice for Butterfly Species Classification Based on Deep Learning

This article introduces the Butterfly-Image-Classification project, which uses TensorFlow and Convolutional Neural Networks (CNN) to classify images of 10 butterfly species. It covers a complete machine learning workflow from automatic dataset download from Zenodo, image preprocessing, data augmentation, dual-model architecture training, evaluation to visualization, and provides cross-platform precompiled executable files. The project not only implements a high-performance classifier but also demonstrates software engineering practices for production-level machine learning projects (modular code, testing, CI/CD, etc.).

2

Section 02

Background: Integration of Biodiversity Conservation and AI

Global biodiversity is under threat, with 1 million species endangered and a significant decline in insects. As ecological indicator species, traditional manual identification of butterflies is time-consuming, labor-intensive, and requires specialization. Computer vision technology provides the possibility of automated species identification, enabling fast and accurate identification to support biodiversity monitoring, ecological research, and citizen science. This project is a practice in this direction, demonstrating the construction of a complete production-level project from data to deployment.

3

Section 03

Dataset and Dual CNN Model Architecture

Dataset: Uses the Leeds Butterfly Dataset, which contains color RGB images of 10 butterfly species from Zenodo. The project has a built-in automatic download function. Each species' images vary in posture, lighting, etc., increasing the classification challenge.

Technical Architecture:

  • Basic CNN: Includes convolutional layers (extracting local features), pooling layers (dimensionality reduction), and fully connected layers (mapping to 10 categories) as a performance benchmark.
  • Improved CNN: Introduces batch normalization (accelerates convergence, stabilizes gradients) and Dropout (regularization to improve generalization).

Hyperparameters: Managed centrally in scripts/config.py, such as IMAGE_SIZE=(128,128), EPOCHS=30, BATCH_SIZE=32, etc., to ensure experimental reproducibility.

4

Section 04

Complete Workflow: From Data Preprocessing to Model Evaluation

Data Preprocessing and Augmentation:

  • Preprocessing: Load and resize images to 128×128, normalize pixel values, and split into training/validation/test sets in an 80/20 ratio.
  • Augmentation: Generate 4 copies of each image, including random horizontal flip, rotation, brightness adjustment, cropping, etc., to improve generalization.

Model Training: Uses TensorFlow Keras with cross-entropy loss function, Adam optimizer, and callback functions including early stopping, model checkpoint, and learning rate scheduling.

Evaluation and Visualization: Generate confusion matrix (showing classification performance), accuracy per category (identifying weaknesses), training history curves (diagnosing convergence/overfitting), and prediction examples (intuitively displaying results).

5

Section 05

Software Engineering Practices: Guarantee for Production-Level Projects

Modular Code: Clear structure with independent modules (data download, preprocessing, model, training, etc.) to reduce coupling.

Test Suite: Uses pytest to cover functions like data download, preprocessing, training, etc., with test coverage ≥80% to ensure code quality.

CI/CD Pipeline: Automated via GitHub Actions, including code style checks, unit tests, matrix testing (multiple Python versions), and cross-platform executable file building.

Cross-Platform Deployment: Provides precompiled executable files for Linux, Windows, and macOS, no dependencies needed, and data storage locations automatically adapt to the environment.

6

Section 06

Application Scenarios and Future Improvement Directions

Current Applications: Citizen science (naturalists identifying butterflies), education (demonstrating CV and deep learning), ecological monitoring (assisting population surveys).

Future Directions: Expand species range, mobile deployment, transfer learning (using ResNet/EfficientNet to improve accuracy), uncertainty estimation (providing confidence levels), crowdsourced data collection (enriching the dataset).

7

Section 07

Conclusion: A Model AI Project Under Engineering Thinking

This project demonstrates how to transform machine learning ideas into production-level software, focusing not only on model performance but also on code quality, testing, automated deployment, and user experience. In today's era of AI popularization, engineering thinking is crucial. This project provides an excellent practical case for developers learning computer vision and deep learning, covering the complete workflow with clear code, comprehensive documentation, and thorough testing to help developers accumulate practical experience.