Section 01
[Introduction] PCCX: An Open-Source NPU Architecture for Transformer Inference on Edge FPGAs
PCCX is a hardware-software co-optimization framework designed specifically for Transformer large language model inference on edge devices. Targeting the Xilinx KV260 development board, it addresses memory bandwidth bottlenecks via W4A8 quantization, a custom VLIW instruction set, and a split data path. Its core goal is to accelerate autoregressive decoding inference on resource-constrained edge devices.