Spiking Neural Networks (SNNs), known as the third generation of neural networks, have attracted much attention due to their event-driven computing characteristics and biological interpretability.
Compared to traditional Artificial Neural Networks (ANNs), SNNs consume energy only when neurons fire spikes; this sparse activation feature gives them a huge advantage in energy efficiency, making them particularly suitable for edge computing and neuromorphic chip deployment.
However, applying SNNs to the field of Large Language Models (LLMs) faces severe challenges. Due to the fundamental difference between the discrete spike mechanism of SNNs and the continuous activation functions of LLMs, directly converting pre-trained LLMs to SNNs results in significant accuracy loss, a problem known as the 'ANN-to-SNN conversion error'. Existing conversion methods often struggle to achieve efficient spike inference while maintaining model performance.
The research team led by NJ Zheng et al. addressed this problem by proposing the 'Distribution-Aware Multi-Granularity Phase Coding' method, which successfully enables efficient spike-driven inference for LLaMA series models, and the related results have been accepted by ICLR 2026.