# SpikingLLM: Reducing Conversion Error of Spiking-Driven Large Language Models via Distribution-Aware Multi-Granularity Phase Coding

> Open-source implementation of an ICLR 2026 accepted paper, proposing a distribution-aware multi-granularity phase coding method that effectively reduces ANN-to-SNN conversion error, enabling efficient spiking neural network inference on LLaMA-2 and LLaMA-3 models.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-16T09:13:56.000Z
- 最近活动: 2026-06-16T09:27:45.615Z
- 热度: 161.8
- 关键词: 脉冲神经网络, SNN, 大语言模型, 相位编码, ANN-to-SNN转换, ICLR 2026, LLaMA, 神经形态计算, 边缘计算
- 页面链接: https://www.zingnex.cn/en/forum/thread/spikingllm-0d2ef38c
- Canonical: https://www.zingnex.cn/forum/thread/spikingllm-0d2ef38c
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: SpikingLLM: Reducing Conversion Error of Spiking-Driven Large Language Models via Distribution-Aware Multi-Granularity Phase Coding

Open-source implementation of an ICLR 2026 accepted paper, proposing a distribution-aware multi-granularity phase coding method that effectively reduces ANN-to-SNN conversion error, enabling efficient spiking neural network inference on LLaMA-2 and LLaMA-3 models.

## Original Authors and Source

- Original Author/Maintainer: njzhenghy
- Source Platform: GitHub
- Original Title: SpikingLLM
- Original Link: https://github.com/njzhenghy/SpikingLLM
- Source Publication/Update Time: 2026-06-16T09:13:56Z

## Research Background: Challenges in Integrating Spiking Neural Networks with Large Language Models

Spiking Neural Networks (SNNs), known as the third generation of neural networks, have attracted much attention due to their event-driven computing characteristics and biological interpretability.

Compared to traditional Artificial Neural Networks (ANNs), SNNs consume energy only when neurons fire spikes; this sparse activation feature gives them a huge advantage in energy efficiency, making them particularly suitable for edge computing and neuromorphic chip deployment.

However, applying SNNs to the field of Large Language Models (LLMs) faces severe challenges. Due to the fundamental difference between the discrete spike mechanism of SNNs and the continuous activation functions of LLMs, directly converting pre-trained LLMs to SNNs results in significant accuracy loss, a problem known as the 'ANN-to-SNN conversion error'. Existing conversion methods often struggle to achieve efficient spike inference while maintaining model performance.

The research team led by NJ Zheng et al. addressed this problem by proposing the 'Distribution-Aware Multi-Granularity Phase Coding' method, which successfully enables efficient spike-driven inference for LLaMA series models, and the related results have been accepted by ICLR 2026.

## Basic Principles of Phase Coding

Phase Coding is an important temporal coding method in SNNs, which uses the timing of spike firing to encode information.

Compared to traditional Rate Coding, Phase Coding can transmit more information in fewer time steps, thereby improving the inference efficiency of SNNs.

In Phase Coding, the activation value of a neuron is encoded as the firing time of the spike within a specific time window. For example, a higher activation value corresponds to an earlier spike firing time, while a lower activation value corresponds to a later firing time. This coding method allows SNNs to transmit analog value information in a single time step, greatly improving information transmission efficiency.

## Multi-Granularity Coding Strategy

The research team found that single-granularity phase coding is difficult to adapt to the differences in activation distributions of different layers and neurons in LLMs.

To address this, they proposed the 'Multi-Granularity Phase Coding' strategy, which allows the model to adaptively select the coding granularity based on the distribution characteristics of activation values.

Specifically, this method groups neurons, and each group uses a different coding granularity (grain). For example, some groups may use 2-level granularity (dividing the activation range into 2 intervals), while others may use 3-level granularity (dividing the activation range into 3 intervals). This flexible grouping strategy allows the coding to better match the actual activation distribution of each neuron group.

## Distribution-Aware Optimization

'Distribution Awareness' is one of the core innovations of this method.

The research team analyzed the statistical distribution of activation values in each layer of LLMs and identified that neurons in different layers and positions have different activation distribution characteristics. Based on this distribution information, they designed an optimization algorithm that automatically selects the most appropriate coding granularity for each neuron group.

This distribution-aware method ensures that coding resources are reasonably allocated: for neuron groups with a relatively concentrated activation distribution, using a coarser granularity is sufficient to ensure accuracy; for neuron groups with a relatively dispersed activation distribution, a finer granularity is needed to fully express the information.

## Supported Models and Configurations

This project provides complete training and conversion code, supporting ANN-to-SNN conversion for LLaMA-2-7B and LLaMA-3-8B models. Experimental results show that this method achieves excellent performance on multiple benchmark tests:

**LLaMA-2-7B Experimental Results** (using 8 time steps, T=8):
- WikiText-2 Perplexity: 5.50 (grain=2) / 5.50 (grain=3)
- WinoGrande Accuracy: 70.48%
- ARC-Challenge Accuracy: 46.50% (grain=2) / 46.33% (grain=3)
- ARC-Easy Accuracy: 73.91% (grain=2) / 73.86% (grain=3)
- PIQA Accuracy: 78.29% (grain=2) / 78.35% (grain=3)

**LLaMA-3-8B Experimental Results** (using 8 time steps, T=8):
- WikiText-2 Perplexity: 6.34 (grain=2) / 6.33 (grain=3)
- WinoGrande Accuracy: 72.93% (grain=2) / 73.72% (grain=3)
- ARC-Challenge Accuracy: 54.01% (grain=2) / 53.41% (grain=3)
- ARC-Easy Accuracy: 77.44% (grain=2) / 77.36% (grain=3)
- PIQA Accuracy: 80.63% (grain=2) / 80.36% (grain=3)

These results indicate that even with a small number of time steps (e.g., 6-10 steps), this method can still maintain high model performance, significantly outperforming traditional ANN-to-SNN conversion methods.

## Key Technical Components

**Fast Hadamard Transform**: The project uses the fast-hadamard-transform library developed by Dao-AILab for efficient computation of Hadamard transforms, which is a key mathematical tool for implementing phase coding.

**Grain Analysis Optimization**: The research team uses the Grain Analysis module to analyze neuron activation distributions and select the optimal coding granularity for each neuron group. The optimized parameter configuration has further improved the results compared to those reported in the original paper.

**Training Framework**: The project is built based on PyTorch 2.4.1, supports CUDA 12.4, and integrates efficient attention mechanism implementations such as Flash Attention.
