正文

nn-accelerator：完整的ONNX到FPGA神经网络推理加速方案

一套开源的端到端神经网络FPGA加速方案，包含ONNX编译器、自定义指令集、HLS加速器IP和裸机固件，支持ZYNQ和FMQL平台。

FPGAONNX神经网络加速器HLSZYNQ边缘计算AI推理嵌入式AI

发布时间 2026/06/14 17:41最近活动 2026/06/14 17:49预计阅读 7 分钟

章节 01

nn-accelerator: Open Source End-to-End ONNX to FPGA AI Inference Accelerator

nn-accelerator is an open-source end-to-end neural network FPGA acceleration solution. It supports converting ONNX models to FPGA hardware for inference, covering key components like an ONNX compiler, custom instruction set, HLS accelerator IP, and bare-metal firmware. Target platforms include Xilinx ZYNQ and复旦微电子 FMQL series, focusing on edge computing scenarios requiring real-time performance and low power consumption. This thread will break down its background, architecture, technical highlights, applications, and usage steps.

章节 02

Project Background & Overview

Original Author & Source

Author/Maintainer: Mikeya98
Source: GitHub
Release Time: 2026-06-14
Link: https://github.com/Mikeya98/nn-accelerator

Project Overview

nn-accelerator is a complete open-source project aiming for end-to-end neural network inference acceleration from ONNX models to FPGA hardware. Developed by embedded AI engineer Mikeya98, it provides a full toolchain covering model compilation, instruction set design, hardware acceleration, and firmware drivers. Unlike cloud AI inference solutions, it focuses on edge computing scenarios, supporting embedded FPGA devices like Xilinx ZYNQ-7045 and复旦微电子 FMQL45/FMQL100TAI, suitable for industrial applications with strict real-time and power requirements.

章节 03

System Architecture & Workflow

The system uses a layered architecture with data flow from high-level model to hardware execution:

Model Input: Accepts standard ONNX format models, compatible with PyTorch, TensorFlow, ONNX Runtime.
Compiler: Python-based core component, including parsing ONNX models, converting to custom intermediate representation, generating optimized machine instructions, and outputting .bin files. It uses a custom 16-bit instruction set optimized for FPGA.
Simulator: Python-based cycle-accurate instruction-level simulator for debugging before hardware deployment, ensuring compatibility with real hardware.
HLS Accelerator IP: C++-written in Vivado HLS, supporting operators like Conv2D, FullyConnected, MaxPool, ReLU/Sigmoid, GRU, Add/Mul.
Bare-metal Firmware: For ZYNQ's ARM Cortex-A9, using interrupt-driven execution to manage accelerator configuration, data transfer, and task scheduling without an OS.

章节 04

Key Technical Highlights

End-to-End Completeness: Provides a full toolchain (compiler + IP + firmware) to reduce integration barriers.
Custom 16-bit ISA: Optimized for FPGA resource constraints and neural network computing, achieving higher energy efficiency with fewer resources.
Cross-Platform Support: Supports both Xilinx ZYNQ and复旦微电子 FMQL domestic FPGA, enabling autonomous solutions.
Solid Engineering Practices: Clear directory structure, complete unit tests, synthesis scripts, and deployment docs; ip_release/ offers plug-and-play Vivado IP and integration guides.

章节 05

Application Scenarios

Industrial Vision Inspection: Suitable for production line quality control and defect detection with real-time and low power needs.
Embedded Smart Devices: Ideal for battery-powered devices like smart home, security cameras, drones (supports light-weight networks like GRU for voice wake-up or gesture recognition).
Academic Research: A reference implementation for neural network accelerator architecture studies.
Domestic Chip Validation: Supports FMQL platforms to verify AI capabilities of domestic FPGAs.

章节 06

Quick Start & Development Flow

The usage flow is intuitive:

Prepare an ONNX model with supported operators.
Compile the model using the Python compiler to generate binary instructions.
Verify correctness with the simulator.
Run Vivado HLS synthesis to generate hardware IP.
Integrate the IP into Vivado project and deploy with the provided bare-metal firmware.

This flow follows the 'software first, hardware verify'理念, reducing development iteration time.

章节 07

Conclusion & Outlook

nn-accelerator is a technically solid and well-engineered open-source project that fills the gap in ONNX-to-FPGA end-to-end deployment toolchains. It provides not only runnable code but also a systematic methodology for edge AI accelerator design. For developers looking to deploy neural networks on FPGA, it's an excellent starting point for product development, academic research, or learning HLS and embedded AI. As edge AI demand grows, such open-source infrastructure will play an increasingly important role in democratizing technology.