Section 01
VitaLLM Overview: Key Highlights of the Ternary-Weight LLM Mixed-Precision Accelerator for Edge Devices
VitaLLM is a mixed-precision accelerator for ternary-weight LLMs on edge devices. It adopts a dual-core design (TINT: Multiplier-Free Ternary-Integer Projection and BoothFlex: Reusable Radix-4 Booth Data Path) combined with a predictive sparse attention mechanism. Under the 16nm process, it achieves a decoding speed of 72.46 tokens/s, a prefill time of 0.88 seconds, occupies 0.214 mm² area and 120KB on-chip memory, and solves the precision-efficiency trade-off problem in edge LLM deployment.