Zing Forum

Reading

nn-accelerator: A Complete ONNX-to-FPGA Neural Network Inference Acceleration Solution

An open-source end-to-end neural network FPGA acceleration solution, including an ONNX compiler, custom instruction set, HLS accelerator IP, and bare-metal firmware, supporting ZYNQ and FMQL platforms.

FPGAONNX神经网络加速器HLSZYNQ边缘计算AI推理嵌入式AI
Published 2026-06-14 17:41Recent activity 2026-06-14 17:49Estimated read 8 min
nn-accelerator: A Complete ONNX-to-FPGA Neural Network Inference Acceleration Solution
1

Section 01

nn-accelerator: Open Source End-to-End ONNX to FPGA AI Inference Accelerator

nn-accelerator is an open-source end-to-end neural network FPGA acceleration solution. It supports converting ONNX models to FPGA hardware for inference, covering key components like an ONNX compiler, custom instruction set, HLS accelerator IP, and bare-metal firmware. Target platforms include Xilinx ZYNQ and Fudan Microelectronics FMQL series, focusing on edge computing scenarios requiring real-time performance and low power consumption. This thread will break down its background, architecture, technical highlights, applications, and usage steps.

2

Section 02

Project Background & Overview

Original Author & Source

Project Overview

nn-accelerator is a complete open-source project aiming for end-to-end neural network inference acceleration from ONNX models to FPGA hardware. Developed by embedded AI engineer Mikeya98, it provides a full toolchain covering model compilation, instruction set design, hardware acceleration, and firmware drivers. Unlike cloud AI inference solutions, it focuses on edge computing scenarios, supporting embedded FPGA devices like Xilinx ZYNQ-7045 and Fudan Microelectronics FMQL45/FMQL100TAI, suitable for industrial applications with strict real-time and power requirements.

3

Section 03

System Architecture & Workflow

The system uses a layered architecture with data flow from high-level model to hardware execution:

  1. Model Input: Accepts standard ONNX format models, compatible with PyTorch, TensorFlow, ONNX Runtime.
  2. Compiler: Python-based core component, including parsing ONNX models, converting to custom intermediate representation, generating optimized machine instructions, and outputting .bin files. It uses a custom 16-bit instruction set optimized for FPGA.
  3. Simulator: Python-based cycle-accurate instruction-level simulator for debugging before hardware deployment, ensuring compatibility with real hardware.
  4. HLS Accelerator IP: C++-written in Vivado HLS, supporting operators like Conv2D, FullyConnected, MaxPool, ReLU/Sigmoid, GRU, Add/Mul.
  5. Bare-metal Firmware: For ZYNQ's ARM Cortex-A9, using interrupt-driven execution to manage accelerator configuration, data transfer, and task scheduling without an OS.
4

Section 04

Key Technical Highlights

  • End-to-End Completeness: Provides a full toolchain (compiler + IP + firmware) to reduce integration barriers.
  • Custom 16-bit ISA: Optimized for FPGA resource constraints and neural network computing, achieving higher energy efficiency with fewer resources.
  • Cross-Platform Support: Supports both Xilinx ZYNQ and Fudan Microelectronics FMQL domestic FPGAs, enabling autonomous solutions.
  • Solid Engineering Practices: Clear directory structure, complete unit tests, synthesis scripts, and deployment docs; ip_release/ offers plug-and-play Vivado IP and integration guides.
5

Section 05

Application Scenarios

  • Industrial Vision Inspection: Suitable for production line quality control and defect detection with real-time and low power needs.
  • Embedded Smart Devices: Ideal for battery-powered devices like smart home, security cameras, drones (supports light-weight networks like GRU for voice wake-up or gesture recognition).
  • Academic Research: A reference implementation for neural network accelerator architecture studies.
  • Domestic Chip Validation: Supports FMQL platforms to verify AI capabilities of domestic FPGAs.
6

Section 06

Quick Start & Development Flow

The usage flow is intuitive:

  1. Prepare an ONNX model with supported operators.
  2. Compile the model using the Python compiler to generate binary instructions.
  3. Verify correctness with the simulator.
  4. Run Vivado HLS synthesis to generate hardware IP.
  5. Integrate the IP into Vivado project and deploy with the provided bare-metal firmware.

This flow follows the 'software first, hardware verify' philosophy, reducing development iteration time.

7

Section 07

Conclusion & Outlook

nn-accelerator is a technically solid and well-engineered open-source project that fills the gap in ONNX-to-FPGA end-to-end deployment toolchains. It provides not only runnable code but also a systematic methodology for edge AI accelerator design. For developers looking to deploy neural networks on FPGA, it's an excellent starting point for product development, academic research, or learning HLS and embedded AI. As edge AI demand grows, such open-source infrastructure will play an increasingly important role in democratizing technology.