Zing Forum

Reading

Fairness Pruning: Eliminating Biases in Large Language Models via Activation-Guided MLP Pruning

This article article introduces a new method called Fairness Pruning, which effectively reduces biases in large language models (LLMs) without significantly sacrificing model performance through activation-guided MLP width pruning technology.

大语言模型偏见缓解模型剪枝Fairness PruningMLP剪枝激活分析LlamaAI公平性神经网络优化
Published 2026-04-27 16:46Recent activity 2026-04-27 16:50Estimated read 6 min
Fairness Pruning: Eliminating Biases in Large Language Models via Activation-Guided MLP Pruning
1

Section 01

Fairness Pruning: A New Method for Mitigating LLM Biases via Activation-Guided MLP Pruning

This article introduces an innovative method called Fairness Pruning, which precisely identifies and removes biased neurons in models through activation-guided MLP width pruning technology. It effectively reduces biases in large language models (LLMs) without significantly sacrificing model performance, providing a new approach to solving the fairness-performance trade-off problem in LLMs.

2

Section 02

Background of LLM Bias Problem and Challenges of Existing Methods

Biases in LLMs stem from uneven distribution of training data, which easily absorbs and amplifies social stereotypes. Existing bias mitigation methods (data debiasing, training constraints, post-processing adjustments) generally face the "fairness-performance trade-off" dilemma: over-pursuing fairness may lead to a significant decline in the overall performance of the model.

3

Section 03

Core Idea of Fairness Pruning: Dual-Objective Optimization to Locate Biased Neurons

The core insight of Fairness Pruning is that LLM biases are concentrated in specific neuron subsets. This method transforms fairness into a network structure optimization problem by using a dual-objective framework: it simultaneously considers the bias contribution and structural importance of neurons, and prioritizes pruning neurons with "high bias contribution and low structural importance.

4

Section 04

Activation-Guided Mechanism for Detecting Biased Neurons

Fairness Pruning identifies biased neurons through activation analysis: it uses the OptiPFair tool to analyze the activation patterns of the model when processing texts with sensitive attributes (gender, race), and marks neurons with systematic activation differences. This method does not require additional training data or gradient calculations, is interpretable, and can locate the distribution of biases layer by layer.

5

Section 05

Implementation Strategy of MLP Width Pruning

After identifying biased neurons, a structured width pruning strategy (reducing the number of hidden units in MLP layers) is adopted, following a greedy strategy: neurons with high bias contribution but low structural importance are prioritized for removal. Structural importance is measured by the neuron's impact on output, and bias contribution is evaluated by changes in fairness metrics.

6

Section 06

Experimental Validation: Effects and Advantages of Fairness Pruning

This method has been validated on mainstream models such as Llama-3.2 (1B/3B parameter versions) and Salamandra-2B. The results show that it can significantly reduce bias metrics in fairness benchmark tests while maintaining the overall performance of the model; the pruned model also gains practical benefits in inference speed and memory usage, making it suitable for resource-constrained environments.

7

Section 07

Practical Significance and Limitations of Fairness Pruning

Practical significance: It provides a practical tool for the responsible deployment of AI, suitable for sensitive scenarios such as recruitment assistance, credit evaluation, and content moderation, and can quickly reduce bias risks. Limitations include: activation analysis requires designing test cases for specific biases (needs prior knowledge); pruning is irreversible and may permanently lose related capabilities.

8

Section 08

Future Directions and Conclusion

Future research directions include: developing more fine-grained neuron importance evaluation methods, exploring strategies combining pruning and fine-tuning, and extending fairness pruning to other components of the model (such as attention heads). Fairness Pruning represents an important advancement in the field of LLM bias mitigation, providing new possibilities for balancing performance and fairness, and will help promote the responsible development of AI.