Section 01
[Introduction] quantized-SLM: Restoring Inference Capability of Quantized Small Models via Inference-Time Techniques
The core goal of the quantized-SLM project is to restore the inference fidelity of quantized small language models (SLMs) using pure inference-time techniques (without retraining or increasing model parameters), addressing the key issue of degraded inference performance after quantization. This project provides an efficient and high-performance model deployment solution for edge AI and cost-sensitive scenarios, balancing model compression efficiency and inference capability.