Zing Forum

Reading

NeFT: A New Neuron-Level Supervised Fine-Tuning Method for Large Language Models

NeFT proposes a neuron-level supervised fine-tuning framework. By identifying and selectively updating task-relevant neurons, it achieves efficient parameter adaptation while preserving the model's general capabilities, opening up a new path for low-cost fine-tuning of large models.

神经元级微调参数高效微调大语言模型COLING2025模型适配稀疏更新神经网络可解释性
Published 2026-05-06 00:05Recent activity 2026-05-06 00:25Estimated read 7 min
NeFT: A New Neuron-Level Supervised Fine-Tuning Method for Large Language Models
1

Section 01

NeFT: Introduction to the New Neuron-Level Supervised Fine-Tuning Method

NeFT (Neuron-level Fine-Tuning) is a neuron-level supervised fine-tuning framework for large language models published at COLING 2025. Addressing the limitation that existing Parameter-Efficient Fine-Tuning (PEFT) methods mostly operate at the layer or matrix level, it achieves more precise and efficient model adaptation by identifying and selectively updating task-relevant neurons. While preserving general capabilities, it reduces fine-tuning costs, opening a new path for low-cost fine-tuning of large models.

2

Section 02

Background and Technical Challenges of NeFT

The growing parameter scale of large language models makes full-parameter fine-tuning costs unbearable, leading to the emergence of PEFT technologies (LoRA, Adapter, etc.). However, existing PEFT methods mostly operate at the layer/matrix level, ignoring the fine-grained characteristics of neurons. Studies have found that large models have "expert neurons"—specific neurons are highly sensitive to specific tasks/knowledge. Based on this, NeFT advances the fine-tuning granularity to individual neurons.

3

Section 03

Core Ideas and Neuron Identification of NeFT

Core Hypothesis

Task adaptation only requires updating a subset of neurons relevant to the target task

Specialized Division of Neurons

  • Syntax neurons: sensitive to syntactic structures
  • Knowledge neurons: store domain facts
  • Reasoning neurons: participate in logical deduction
  • Safety neurons: related to content filtering and ethical alignment

Neuron Importance Evaluation

  1. Activation tracking: record activation patterns of task data
  2. Gradient attribution: calculate gradient contribution to loss
  3. Intervention experiments: mask neurons to observe performance impact Comprehensively select the top-K important neurons.
4

Section 04

Technical Architecture Design of NeFT

Selective Neuron Update

Construct a sparse mask to update only selected neurons: W_new = W_old + M ⊙ ΔW (M is a binary mask). Experiments show that updating 5-10% of neurons can achieve performance equivalent to or better than LoRA

Cross-Layer Correlation Modeling

Introduce a neuron graph neural network: neurons as nodes, co-activation patterns/connection weights as edges, and graph convolution propagates update signals

Dynamic Scheduling

  • Early stage: wide-range neuron activation for rapid adaptation
  • Mid stage: focus on high-importance neurons for refined adjustment
  • Late stage: regularization to prevent overfitting.
5

Section 05

Training Process and Technology Integration of NeFT

Two-Stage Training

  1. Neuron identification: analyze a small amount of task data (1-5% of the training set) to generate importance ranking
  2. Selective fine-tuning: train on the complete dataset based on the mask, updating only selected neurons

Technology Integration

  • NeFT+LoRA: restrict neurons on top of low-rank updates
  • NeFT+Quantization: low-precision storage for inactive neurons
  • NeFT+Distillation: mask-guided knowledge transfer.
6

Section 06

Experimental Results and Analysis of NeFT

Benchmark Tests

Method Parameter Ratio Average Performance General Capability Retention
Full FT 100% 85.2% 62.1%
LoRA 0.8% 83.7% 78.4%
Adapter 1.2% 82.9% 80.2%
NeFT 0.5% 84.5% 85.7%
NeFT achieves performance close to Full FT with the lowest parameter ratio and the best general capability retention

Efficiency Advantages

  • Memory usage reduced by 40-50%
  • Backpropagation computation reduced by 60%
  • No additional overhead in inference.
7

Section 07

Application Scenarios and Value of NeFT

  • Multi-task service: share the base model, each task has an independent mask, memory usage reduced by an order of magnitude
  • Privacy domain: sparse updates reduce gradient uploads, supporting federated learning
  • Model safety: monitor "safety neurons" to achieve fine-grained alignment.
8

Section 08

Summary and Future Directions of NeFT

Summary

NeFT promotes the evolution of fine-tuning towards fine granularity, balancing efficiency, performance, and generality

Limitations

  1. High cost of neuron identification
  2. Mask stability needs improvement
  3. Insufficient interpretability of neuron encoding

Future Directions

  • Automatic neuron architecture search
  • Continual learning and memory
  • Cross-model neuron alignment
  • Dedicated sparse update accelerators.