Zing Forum

Reading

RePAIR: Interactive Machine Unlearning, Empowering Users to Control the Knowledge Boundaries of Large Models

This article introduces the RePAIR framework, which implements a new paradigm of Interactive Machine Unlearning (IMU). Users can instruct the model to forget specific knowledge during inference via natural language commands. The core STAMP method guides MLP activations to a rejection subspace through pseudoinverse updates, enabling efficient, on-device knowledge deletion without retraining.

RePAIR机器遗忘交互式遗忘用户控制STAMP隐私保护模型修复设备端计算
Published 2026-04-14 22:44Recent activity 2026-04-15 09:55Estimated read 5 min
RePAIR: Interactive Machine Unlearning, Empowering Users to Control the Knowledge Boundaries of Large Models
1

Section 01

RePAIR: Interactive Machine Unlearning, Empowering Users to Control the Knowledge Boundaries of Large Models (Introduction)

This article introduces the RePAIR framework and proposes a new paradigm of Interactive Machine Unlearning (IMU). Users can instruct the model to forget specific knowledge during inference via natural language commands. The core STAMP method guides MLP activations to a rejection subspace through pseudoinverse updates, enabling efficient, on-device knowledge deletion without retraining. This solves the selective unlearning challenge for large models and returns data control to users.

2

Section 02

Background: Memory Dilemmas of Large Models and Limitations of Existing Methods

Large models absorb massive amounts of data during training, easily learning harmful knowledge (e.g., how to make dangerous items), misinformation (pseudoscientific advice), and personal privacy, yet lack a selective unlearning mechanism. Existing machine unlearning methods are provider-centric, requiring retraining or complex post-processing. Ordinary users cannot independently control whether their data is forgotten, leading to privacy and ethical issues.

3

Section 03

Methodology: Interactive Machine Unlearning Paradigm and System Architecture

RePAIR proposes the Interactive Machine Unlearning (IMU) paradigm, where users trigger unlearning in real time via natural language commands. The system consists of three components: a Watchdog model to detect unlearning intent, a Surgeon model to generate repair procedures (identify content to forget, plan steps, generate parameter modification instructions), and a Patient model to execute parameter updates, achieving separation of responsibilities.

4

Section 04

Core Technology: Principles and Advantages of the STAMP Method

STAMP (Steering Through Activation Manipulation with PseudoInverse) is the core technology of RePAIR, featuring no retraining, single-sample operation, and high efficiency. It is based on the observation that model knowledge is encoded in MLP activation patterns. By guiding activations to a rejection subspace via pseudoinverse updates, the model refuses to answer relevant inputs. A low-rank variant reduces computational complexity, completes operations in milliseconds, and supports on-device execution.

5

Section 05

Experimental Validation: Results and Baseline Comparison

RePAIR was tested in three scenarios: 1. Harmful knowledge suppression: Forgetting score approaches 0, while retaining 84.47% of task performance; 2. Misinformation correction: F-RL metric is 0.00, completely forgetting misinformation; 3. Personal data erasure: R-RL metric is 0.88, accurately erasing target data while preserving irrelevant knowledge. Compared with 6 baselines, RePAIR performs best in terms of unlearning completeness, model utility, efficiency, and user control.

6

Section 06

Technical Highlights and Application Scenarios

Technical Highlights: 1. User autonomy without relying on providers; 2. No retraining, millisecond-level unlearning; 3. On-device execution for privacy protection; 4. Extensible to multimodal models. Application Scenarios: Personal privacy protection (GDPR compliance), enterprise data security, real-time fact-checking, and safety compliance.

7

Section 07

Limitations and Future Research Directions

Limitations: The theory of complete unlearning is not fully resolved, and indirect recovery may occur; side effect control is difficult (over/under unlearning); risk of adversarial attacks; interpretability needs improvement. Future Directions: Multimodal unlearning, progressive unlearning, reversible unlearning, and federated unlearning.