Section 01
Introduction: A New Post-Processing Scheme for Bias Mitigation in Large Language Models
This article introduces an open-source debiasing framework targeting social bias issues in large language models. It corely uses seven-dimensional confidence signal extraction and a mixture-of-experts aggregator to achieve post-processing debiasing without modifying model weights, and has achieved significant results on the BBQ benchmark. This framework addresses the problems of high cost, poor generalization, or over-correction in traditional debiasing methods, providing a new path for AI ethics and fairness research.