Zing Forum

Reading

SystemVerilog Implementation of Neural Network Inference: Fixed-Point Quantization and Hardware Deployment Practice

This project demonstrates how to convert a Python-trained neural network into a SystemVerilog hardware implementation using the Q3.12 fixed-point quantization format, achieving a test accuracy of 92.98% on the breast cancer classification task and providing a reproducible reference implementation for AI chip design.

SystemVerilog神经网络硬件化定点数量化FPGAAI芯片边缘计算数字电路设计模型部署
Published 2026-06-08 16:42Recent activity 2026-06-08 16:54Estimated read 6 min
SystemVerilog Implementation of Neural Network Inference: Fixed-Point Quantization and Hardware Deployment Practice
1

Section 01

Project Introduction: Fixed-Point Practice for Neural Network Inference in SystemVerilog

SystemVerilog Implementation of Neural Network Inference: Fixed-Point Quantization and Hardware Deployment Practice

This project was developed by Kiana Jafari, with source code hosted on GitHub (link) and released on June 8, 2026. The core content is converting a Python-trained neural network into a SystemVerilog hardware implementation using the Q3.12 fixed-point quantization format, achieving a test accuracy of 92.98% on the breast cancer classification task and providing a reproducible reference implementation for AI chip design.

2

Section 02

Background: Core Challenges in Neural Network Hardware Implementation

Background: Engineering Challenges in Neural Network Hardware Implementation

The popularity of deep learning has driven the demand for deploying dedicated hardware (FPGA/ASIC), as their low power consumption and high throughput are suitable for edge computing scenarios. However, hardware implementation of floating-point operations (FP32) consumes significant resources, making quantization technology a key solution—converting floating-point weights/activations to fixed-point numbers to balance accuracy and complexity. This project demonstrates an end-to-end workflow from Python training to SystemVerilog implementation.

3

Section 03

Project Architecture and Network Design

Project Architecture and Network Design

The project adopts a three-layer architecture:

  1. Data Directory: Stores the Wisconsin Breast Cancer Dataset (569 samples, 30 features) and preprocessing scripts;
  2. Python Directory: Trains a minimal 2-4-2 network (input: 2 neurons → hidden layer: 4 neurons (ReLU activation) → output: 2 neurons (Softmax for training, simplified to Argmax for inference));
  3. SystemVerilog Directory: Core hardware implementation code. The network input is reduced to 2-dimensional features via dimensionality reduction (e.g., PCA).
4

Section 04

Quantization Strategy: Analysis of Q3.12 Fixed-Point Format

Quantization Strategy: Q3.12 Fixed-Point Format

The project uses Q3.12 fixed-point numbers: total width of 16 bits, 3 bits for the integer part (including sign bit, range from -4 to 3.9997), and 12 bits for the fractional part (precision ~0.00024). A post-training quantization strategy is adopted: first train the model in floating-point, then convert weights to fixed-point numbers to balance accuracy and resource overhead.

5

Section 05

Hardware Implementation and Development Workflow

Hardware Implementation and Development Workflow

Hardware Modules

  • Matrix Multiplication Unit: Implements operations from input to hidden layer (2×4) and hidden layer to output layer (4×2);
  • Activation Function Module: Lightweight implementation of ReLU (max(0,x)) and Argmax;
  • Data Path: Handles fixed-point overflow issues;
  • Storage Architecture: Weights stored in on-chip RAM/ROM or registers.

Development Workflow

  1. Floating-point model training; 2. Quantization calibration (determine scaling factors/zero points); 3. Quantized model validation; 4. SystemVerilog implementation; 5. Simulation validation (compare with Python results); 6. Synthesis and deployment.
6

Section 06

Application Value and Improvement Directions

Application Value and Improvement Directions

Value

  • Reproducible reference for edge AI developers;
  • End-to-end case for digital chip design learners;
  • Modular starting point for AI chip design.

Limitations

  • Small network scale (only 16 weights);
  • Simple quantization strategy (post-training quantization);
  • Lack of complete verification environment description.

Improvement Directions

  • Extend to LeNet/small ResNet;
  • Support convolutional layers;
  • Adopt Quantization-Aware Training (QAT);
  • Provide FPGA deployment tutorials and performance benchmarks.