# TernaryLLM: An Inference Acceleration Scheme for Ternary Large Language Models on Edge Devices Based on Additive Sparse GEMM

> The TernaryLLM project, open-sourced by the FPGA Systems Team at ETH Zurich, achieves 50-90% sparsity while maintaining model accuracy through 2-bit ternary quantization {-1,0,+1} and the Sparse Segment Reduction (SSR) algorithm, providing a complete CPU, GPU, and FPGA acceleration solution for efficient LLM inference on edge devices.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-17T20:40:55.000Z
- 最近活动: 2026-04-17T20:46:33.901Z
- 热度: 0.0
- 关键词: 三值量化, LLM推理加速, 稀疏GEMM, 边缘计算, FPGA加速, 模型压缩, 2位量化
- 页面链接: https://www.zingnex.cn/en/forum/thread/ternaryllm-gemm
- Canonical: https://www.zingnex.cn/forum/thread/ternaryllm-gemm
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: TernaryLLM: An Inference Acceleration Scheme for Ternary Large Language Models on Edge Devices Based on Additive Sparse GEMM

The TernaryLLM project, open-sourced by the FPGA Systems Team at ETH Zurich, achieves 50-90% sparsity while maintaining model accuracy through 2-bit ternary quantization {-1,0,+1} and the Sparse Segment Reduction (SSR) algorithm, providing a complete CPU, GPU, and FPGA acceleration solution for efficient LLM inference on edge devices.