# NeFT: A New Neuron-Level Supervised Fine-Tuning Method for Large Language Models

> NeFT proposes a neuron-level supervised fine-tuning framework. By identifying and selectively updating task-relevant neurons, it achieves efficient parameter adaptation while preserving the model's general capabilities, opening up a new path for low-cost fine-tuning of large models.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-05T16:05:44.000Z
- 最近活动: 2026-05-05T16:25:58.918Z
- 热度: 157.7
- 关键词: 神经元级微调, 参数高效微调, 大语言模型, COLING2025, 模型适配, 稀疏更新, 神经网络可解释性
- 页面链接: https://www.zingnex.cn/en/forum/thread/neft
- Canonical: https://www.zingnex.cn/forum/thread/neft
- Markdown 来源: floors_fallback

---

## NeFT: Introduction to the New Neuron-Level Supervised Fine-Tuning Method

NeFT (Neuron-level Fine-Tuning) is a neuron-level supervised fine-tuning framework for large language models published at COLING 2025. Addressing the limitation that existing Parameter-Efficient Fine-Tuning (PEFT) methods mostly operate at the layer or matrix level, it achieves more precise and efficient model adaptation by identifying and selectively updating task-relevant neurons. While preserving general capabilities, it reduces fine-tuning costs, opening a new path for low-cost fine-tuning of large models.

## Background and Technical Challenges of NeFT

The growing parameter scale of large language models makes full-parameter fine-tuning costs unbearable, leading to the emergence of PEFT technologies (LoRA, Adapter, etc.). However, existing PEFT methods mostly operate at the layer/matrix level, ignoring the fine-grained characteristics of neurons. Studies have found that large models have "expert neurons"—specific neurons are highly sensitive to specific tasks/knowledge. Based on this, NeFT advances the fine-tuning granularity to individual neurons.

## Core Ideas and Neuron Identification of NeFT

### Core Hypothesis
Task adaptation only requires updating a subset of neurons relevant to the target task

### Specialized Division of Neurons
- Syntax neurons: sensitive to syntactic structures
- Knowledge neurons: store domain facts
- Reasoning neurons: participate in logical deduction
- Safety neurons: related to content filtering and ethical alignment

### Neuron Importance Evaluation
1. Activation tracking: record activation patterns of task data
2. Gradient attribution: calculate gradient contribution to loss
3. Intervention experiments: mask neurons to observe performance impact
Comprehensively select the top-K important neurons.

## Technical Architecture Design of NeFT

### Selective Neuron Update
Construct a sparse mask to update only selected neurons: `W_new = W_old + M ⊙ ΔW` (M is a binary mask). Experiments show that updating 5-10% of neurons can achieve performance equivalent to or better than LoRA

### Cross-Layer Correlation Modeling
Introduce a neuron graph neural network: neurons as nodes, co-activation patterns/connection weights as edges, and graph convolution propagates update signals

### Dynamic Scheduling
- Early stage: wide-range neuron activation for rapid adaptation
- Mid stage: focus on high-importance neurons for refined adjustment
- Late stage: regularization to prevent overfitting.

## Training Process and Technology Integration of NeFT

### Two-Stage Training
1. Neuron identification: analyze a small amount of task data (1-5% of the training set) to generate importance ranking
2. Selective fine-tuning: train on the complete dataset based on the mask, updating only selected neurons

### Technology Integration
- NeFT+LoRA: restrict neurons on top of low-rank updates
- NeFT+Quantization: low-precision storage for inactive neurons
- NeFT+Distillation: mask-guided knowledge transfer.

## Experimental Results and Analysis of NeFT

### Benchmark Tests
| Method | Parameter Ratio | Average Performance | General Capability Retention |
|------|----------|---------|-------------|
| Full FT | 100% | 85.2% | 62.1% |
| LoRA | 0.8% | 83.7% |78.4% |
| Adapter |1.2% |82.9% |80.2% |
| NeFT |0.5% |84.5% |85.7% |
NeFT achieves performance close to Full FT with the lowest parameter ratio and the best general capability retention

### Efficiency Advantages
- Memory usage reduced by 40-50%
- Backpropagation computation reduced by 60%
- No additional overhead in inference.

## Application Scenarios and Value of NeFT

- **Multi-task service**: share the base model, each task has an independent mask, memory usage reduced by an order of magnitude
- **Privacy domain**: sparse updates reduce gradient uploads, supporting federated learning
- **Model safety**: monitor "safety neurons" to achieve fine-grained alignment.

## Summary and Future Directions of NeFT

### Summary
NeFT promotes the evolution of fine-tuning towards fine granularity, balancing efficiency, performance, and generality

### Limitations
1. High cost of neuron identification
2. Mask stability needs improvement
3. Insufficient interpretability of neuron encoding

### Future Directions
- Automatic neuron architecture search
- Continual learning and memory
- Cross-model neuron alignment
- Dedicated sparse update accelerators.
