# Fairness Pruning: Eliminating Biases in Large Language Models via Activation-Guided MLP Pruning

> This article article introduces a new method called Fairness Pruning, which effectively reduces biases in large language models (LLMs) without significantly sacrificing model performance through activation-guided MLP width pruning technology.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-04-27T08:46:36.000Z
- 最近活动: 2026-04-27T08:50:00.874Z
- 热度: 161.9
- 关键词: 大语言模型, 偏见缓解, 模型剪枝, Fairness Pruning, MLP剪枝, 激活分析, Llama, AI公平性, 神经网络优化
- 页面链接: https://www.zingnex.cn/en/forum/thread/fairness-pruning-mlp
- Canonical: https://www.zingnex.cn/forum/thread/fairness-pruning-mlp
- Markdown 来源: floors_fallback

---

## Fairness Pruning: A New Method for Mitigating LLM Biases via Activation-Guided MLP Pruning

This article introduces an innovative method called Fairness Pruning, which precisely identifies and removes biased neurons in models through activation-guided MLP width pruning technology. It effectively reduces biases in large language models (LLMs) without significantly sacrificing model performance, providing a new approach to solving the fairness-performance trade-off problem in LLMs.

## Background of LLM Bias Problem and Challenges of Existing Methods

Biases in LLMs stem from uneven distribution of training data, which easily absorbs and amplifies social stereotypes. Existing bias mitigation methods (data debiasing, training constraints, post-processing adjustments) generally face the "fairness-performance trade-off" dilemma: over-pursuing fairness may lead to a significant decline in the overall performance of the model.

## Core Idea of Fairness Pruning: Dual-Objective Optimization to Locate Biased Neurons

The core insight of Fairness Pruning is that LLM biases are concentrated in specific neuron subsets. This method transforms fairness into a network structure optimization problem by using a dual-objective framework: it simultaneously considers the bias contribution and structural importance of neurons, and prioritizes pruning neurons with "high bias contribution and low structural importance.

## Activation-Guided Mechanism for Detecting Biased Neurons

Fairness Pruning identifies biased neurons through activation analysis: it uses the OptiPFair tool to analyze the activation patterns of the model when processing texts with sensitive attributes (gender, race), and marks neurons with systematic activation differences. This method does not require additional training data or gradient calculations, is interpretable, and can locate the distribution of biases layer by layer.

## Implementation Strategy of MLP Width Pruning

After identifying biased neurons, a structured width pruning strategy (reducing the number of hidden units in MLP layers) is adopted, following a greedy strategy: neurons with high bias contribution but low structural importance are prioritized for removal. Structural importance is measured by the neuron's impact on output, and bias contribution is evaluated by changes in fairness metrics.

## Experimental Validation: Effects and Advantages of Fairness Pruning

This method has been validated on mainstream models such as Llama-3.2 (1B/3B parameter versions) and Salamandra-2B. The results show that it can significantly reduce bias metrics in fairness benchmark tests while maintaining the overall performance of the model; the pruned model also gains practical benefits in inference speed and memory usage, making it suitable for resource-constrained environments.

## Practical Significance and Limitations of Fairness Pruning

Practical significance: It provides a practical tool for the responsible deployment of AI, suitable for sensitive scenarios such as recruitment assistance, credit evaluation, and content moderation, and can quickly reduce bias risks. Limitations include: activation analysis requires designing test cases for specific biases (needs prior knowledge); pruning is irreversible and may permanently lose related capabilities.

## Future Directions and Conclusion

Future research directions include: developing more fine-grained neuron importance evaluation methods, exploring strategies combining pruning and fine-tuning, and extending fairness pruning to other components of the model (such as attention heads). Fairness Pruning represents an important advancement in the field of LLM bias mitigation, providing new possibilities for balancing performance and fairness, and will help promote the responsible development of AI.