Zing Forum

Reading

MPU Framework: Privacy-Preserving Knowledge Unlearning for Large Language Models

MPU is an algorithm-agnostic privacy-preserving multiple perturbed copies unlearning framework. Through server-side preprocessing and postprocessing modules, it achieves efficient knowledge unlearning while protecting model parameters and the privacy of unlearning data.

大语言模型知识遗忘隐私保护机器遗忘MPU框架模型安全GDPR人工智能伦理
Published 2026-05-05 16:40Recent activity 2026-05-05 16:47Estimated read 6 min
MPU Framework: Privacy-Preserving Knowledge Unlearning for Large Language Models
1

Section 01

Introduction: MPU Framework—A Privacy-Preserving Knowledge Unlearning Solution for Large Language Models

This article introduces the MPU (Multiple Perturbed Copies Unlearning) framework, an algorithm-agnostic privacy-preserving multiple perturbed copies unlearning framework designed to address the dual non-disclosure constraints in knowledge unlearning for large language models (servers are unwilling to share original model parameters, and clients are unwilling to expose unlearning datasets). Through server-side preprocessing (generating perturbed copies) and postprocessing (aggregation and denoising) modules, MPU achieves efficient knowledge unlearning while protecting model parameters and the privacy of unlearning data.

2

Section 02

Privacy Dilemma of Knowledge Unlearning

Knowledge unlearning for large language models faces fundamental privacy challenges: traditional machine unlearning methods often require servers to share model parameters or clients to expose unlearning datasets, which is unacceptable in practical applications. Servers are concerned about the leakage of original parameters (core intellectual property risks), while clients worry about the exposure of unlearning data (sensitive information or trade secrets). This dual non-disclosure constraint makes existing methods difficult to deploy. The MPU framework is designed to address this dilemma.

3

Section 03

Core Architecture and Flexibility of the MPU Framework

The MPU is an algorithm-agnostic privacy-preserving framework with two core server-side modules:

Preprocessing Module

Generates multiple perturbed copies with the following features: parameter perturbation (injecting noise so that no single copy can restore the original model), reparameterization (functionally equivalent to the original model), and multi-copy distribution.

Postprocessing Module

After clients return the updated models, it performs inverse reparameterization, harmonic denoising, and secure aggregation.

In addition, MPU is algorithm-agnostic—clients can locally use various unlearning algorithms such as NPO, DPO, and GradAscent. Technically, the project is developed with Python 3.11+ and includes components like src/train.py (main entry), src/eval.py (evaluation), configs (Hydra configurations), etc. It is open-sourced under the MIT license.

4

Section 04

Experimental Validation and Benchmark Testing

MPU has been validated on standard benchmarks such as TOFU, MUSE, and WMDP. Experimental configurations are managed by Hydra, allowing customization of hyperparameters (number of copies PUM_M_LIST, noise scale PUM_KAPPA, reparameterization switch). The results show that MPU effectively achieves privacy-preserving knowledge unlearning while maintaining model performance.

5

Section 05

Application Prospects and Value of the MPU Framework

The significance of the MPU framework includes:

  1. Privacy Compliance: Helps meet the "right to be forgotten" requirements of regulations like GDPR;
  2. Intellectual Property Protection: Enables knowledge updates without disclosing model details;
  3. Multi-Party Collaboration: Supports secure model updates in untrusted multi-party environments;
  4. Algorithm Compatibility: Seamlessly integrates with existing unlearning algorithms, lowering adoption barriers.

As LLMs are increasingly applied in sensitive fields, MPU will become an important tool for model governance.

6

Section 06

Conclusion: Innovative Significance of the MPU Framework

The MPU framework successfully addresses the privacy dilemma of knowledge unlearning for large language models through the multiple perturbed copies mechanism. While protecting server model parameters and the privacy of client unlearning data, it maintains unlearning effectiveness and model performance, providing an important technical foundation for building more secure and trustworthy artificial intelligence systems.