Zing Forum

Reading

Verilog-Based 1D CNN Hardware Accelerator: A Real-Time Anomaly Detection Solution for Industrial IoT Edge

This article introduces a 1D Convolutional Neural Network (CNN) hardware accelerator project implemented using Verilog HDL, designed specifically for Industrial Internet of Things (IIoT) scenarios. It enables millisecond-level anomaly detection of time-series data on edge devices without relying on cloud computing.

hardware acceleratorverilog1d cnnedge aiindustrial iotanomaly detectionfpgareal-time inference
Published 2026-05-20 22:12Recent activity 2026-05-20 22:18Estimated read 6 min
Verilog-Based 1D CNN Hardware Accelerator: A Real-Time Anomaly Detection Solution for Industrial IoT Edge
1

Section 01

Project Introduction

This article presents a 1D Convolutional Neural Network (CNN) hardware accelerator project implemented using Verilog HDL, tailored for Industrial Internet of Things (IIoT) scenarios. It achieves millisecond-level anomaly detection of time-series data on edge devices without cloud computing dependency. Targeting industrial motor vibration data, the project classifies operational states into three types: healthy, bearing failure, and rotor imbalance, serving as a typical case of edge AI engineering.

2

Section 02

Project Background

In modern industrial environments, sensors generate massive volumes of data. Traditional cloud-based analysis has three key pain points: high latency (risk of missing fault warning opportunities), high bandwidth consumption (costly to upload raw data), and security risks (potential leakage of sensitive production data). To address these issues, hardware-level neural network acceleration solutions have emerged, enabling anomaly detection in microsecond to millisecond ranges for real-time response.

3

Section 03

Hardware Architecture Design

The accelerator adopts a modular design with core components including:

  1. cnn_top.v: Main controller that coordinates execution order and data transfer between layers;
  2. mac_unit.v: Multiply-accumulate (MAC) unit optimized for speed using a two-stage pipeline;
  3. dual_port_bram.v: Dual-port block RAM supporting simultaneous read/write to improve throughput;
  4. conv1d_bram_fsm.v: Convolution layer controller managing sliding window computation logic;
  5. compute_dense_fsm.v: Fully connected layer controller executing matrix multiplication and outputting class confidence scores;
  6. compute_relu.v: ReLU activation unit filtering negative values to introduce non-linearity.
4

Section 04

Neural Network Structure and Inference Process

Neural Network Structure: Input layer (8 consecutive sensor sampling points) → Conv1D layer (extracts time-series features) → ReLU activation layer → Fully connected layer → Output layer (3 states). Inference Process:

  1. Data Loading: Sensor data and pre-trained weights are loaded into BRAM;
  2. Convolution Calculation: conv1d_bram_fsm controls the mac_unit to perform convolution;
  3. Activation Processing: Convolution results are processed by the ReLU unit;
  4. Classification Inference: Fully connected layer computes scores for the 3 classes;
  5. Result Output: The class with the highest score is selected as the prediction result.
5

Section 05

Verification and Testing

The project uses Xilinx Vivado Simulator for simulation verification. The testbench cnn_top_tb_comprehensive.v can load synthetic data and specific weights, run the full hardware inference process, and automatically compare hardware outputs with expected results. Verification results show that the accelerator successfully identifies the three states: healthy, bearing failure, and rotor imbalance.

6

Section 06

Technical Advantages and Application Prospects

Technical Advantages: Ultra-low latency (microsecond-level response), deterministic performance (no timing jitter), low power consumption (higher energy efficiency of dedicated circuits), offline operation (no network connection required). Future Extensions: Integrate ADC to directly read real sensor data, add AXI-Lite interface for communication with CPU, and deploy to FPGA platforms like Xilinx Artix-7/Zynq.

7

Section 07

Project Conclusion

This project demonstrates the conversion process of an AI model from Python code to digital circuits, serving as a typical case of edge AI engineering. For real-time anomaly detection needs in industrial sites, this hardware-software co-design approach has important reference value.