# ITQ3_S: A High-Precision Quantization Inference Scheme for 3-Bit Large Language Models Based on Rotation Transformations

> This article introduces ITQ3_S, an innovative 3-bit weight quantization format for large language models. It achieves rotation domain smoothing via Fast Walsh-Hadamard Transform, attaining perplexity comparable to FP16 on NVIDIA RTX 5090 while delivering a throughput over 1.5 times higher than 4-bit alternatives.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-30T00:03:22.000Z
- 最近活动: 2026-04-01T04:47:51.029Z
- 热度: 0.0
- 关键词: LLM Quantization, 3-bit Inference, TurboQuant, FWHT, CUDA Optimization
- 页面链接: https://www.zingnex.cn/en/forum/thread/itq3-s
- Canonical: https://www.zingnex.cn/forum/thread/itq3-s
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: ITQ3_S: A High-Precision Quantization Inference Scheme for 3-Bit Large Language Models Based on Rotation Transformations

This article introduces ITQ3_S, an innovative 3-bit weight quantization format for large language models. It achieves rotation domain smoothing via Fast Walsh-Hadamard Transform, attaining perplexity comparable to FP16 on NVIDIA RTX 5090 while delivering a throughput over 1.5 times higher than 4-bit alternatives.