Section 01
Introduction: The CiPO Framework Solves the Unlearning Challenge for Large Reasoning Models
This article proposes the CiPO (Counterfactual Unlearning through Iterative Preference Optimization) framework, which performs iterative preference optimization by generating counterfactual reasoning trajectories. It completely removes target knowledge while preserving the model's reasoning ability, solving the dilemma of machine unlearning for Large Reasoning Models (LRMs).