Reading

NeFT: A New Neuron-Level Supervised Fine-Tuning Method for Large Language Models

NeFT proposes a neuron-level supervised fine-tuning framework. By identifying and selectively updating task-relevant neurons, it achieves efficient parameter adaptation while preserving the model's general capabilities, opening up a new path for low-cost fine-tuning of large models.

神经元级微调参数高效微调大语言模型COLING2025模型适配稀疏更新神经网络可解释性

Published 2026-05-06 00:05Recent activity 2026-05-06 00:25Estimated read 7 min

NeFT: A New Neuron-Level Supervised Fine-Tuning Method for Large Language Models

Section 01

NeFT: Introduction to the New Neuron-Level Supervised Fine-Tuning Method

NeFT (Neuron-level Fine-Tuning) is a neuron-level supervised fine-tuning framework for large language models published at COLING 2025. Addressing the limitation that existing Parameter-Efficient Fine-Tuning (PEFT) methods mostly operate at the layer or matrix level, it achieves more precise and efficient model adaptation by identifying and selectively updating task-relevant neurons. While preserving general capabilities, it reduces fine-tuning costs, opening a new path for low-cost fine-tuning of large models.

Section 02

Background and Technical Challenges of NeFT

The growing parameter scale of large language models makes full-parameter fine-tuning costs unbearable, leading to the emergence of PEFT technologies (LoRA, Adapter, etc.). However, existing PEFT methods mostly operate at the layer/matrix level, ignoring the fine-grained characteristics of neurons. Studies have found that large models have "expert neurons"—specific neurons are highly sensitive to specific tasks/knowledge. Based on this, NeFT advances the fine-tuning granularity to individual neurons.

Section 03

Core Ideas and Neuron Identification of NeFT

Core Hypothesis

Task adaptation only requires updating a subset of neurons relevant to the target task

Specialized Division of Neurons

Syntax neurons: sensitive to syntactic structures
Knowledge neurons: store domain facts
Reasoning neurons: participate in logical deduction
Safety neurons: related to content filtering and ethical alignment

Neuron Importance Evaluation

Activation tracking: record activation patterns of task data
Gradient attribution: calculate gradient contribution to loss
Intervention experiments: mask neurons to observe performance impact Comprehensively select the top-K important neurons.

Section 04

Technical Architecture Design of NeFT

Selective Neuron Update

Construct a sparse mask to update only selected neurons: W_new = W_old + M ⊙ ΔW (M is a binary mask). Experiments show that updating 5-10% of neurons can achieve performance equivalent to or better than LoRA

Cross-Layer Correlation Modeling

Introduce a neuron graph neural network: neurons as nodes, co-activation patterns/connection weights as edges, and graph convolution propagates update signals

Dynamic Scheduling

Early stage: wide-range neuron activation for rapid adaptation
Mid stage: focus on high-importance neurons for refined adjustment
Late stage: regularization to prevent overfitting.

Section 05

Training Process and Technology Integration of NeFT

Two-Stage Training

Neuron identification: analyze a small amount of task data (1-5% of the training set) to generate importance ranking
Selective fine-tuning: train on the complete dataset based on the mask, updating only selected neurons

Technology Integration

NeFT+LoRA: restrict neurons on top of low-rank updates
NeFT+Quantization: low-precision storage for inactive neurons
NeFT+Distillation: mask-guided knowledge transfer.

Section 06

Experimental Results and Analysis of NeFT

Benchmark Tests

Method	Parameter Ratio	Average Performance	General Capability Retention
Full FT	100%	85.2%	62.1%
LoRA	0.8%	83.7%	78.4%
Adapter	1.2%	82.9%	80.2%
NeFT	0.5%	84.5%	85.7%
NeFT achieves performance close to Full FT with the lowest parameter ratio and the best general capability retention

Efficiency Advantages

Memory usage reduced by 40-50%
Backpropagation computation reduced by 60%
No additional overhead in inference.

Section 07

Application Scenarios and Value of NeFT

Multi-task service: share the base model, each task has an independent mask, memory usage reduced by an order of magnitude
Privacy domain: sparse updates reduce gradient uploads, supporting federated learning
Model safety: monitor "safety neurons" to achieve fine-grained alignment.

Section 08

Summary and Future Directions of NeFT

Summary

NeFT promotes the evolution of fine-tuning towards fine granularity, balancing efficiency, performance, and generality

Limitations

High cost of neuron identification
Mask stability needs improvement
Insufficient interpretability of neuron encoding

Future Directions

Automatic neuron architecture search
Continual learning and memory
Cross-model neuron alignment
Dedicated sparse update accelerators.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54