Section 01
Introduction: Diagnosis and Recovery of Quantization Failure in 2-bit Reasoning Models
Paper Core: 2-bit quantization leads to generative pathologies such as loops and delayed commitments in reasoning models. Two lightweight control methods—FP16 planning and cycle rescue—are proposed, which boost Qwen3-8B's accuracy from 17.2% to 74.2% while maintaining end-to-end speed.
Original Paper Information:
- Author Team: Brain Lab Research
- Source: arXiv
- Title: Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery
- Link: http://arxiv.org/abs/2606.02011v1
- Code Repository: https://github.com/brain-lab-research/quantized-reasoning
- Publication Date: 2026-06-01