Reading

PEFT-Arena: Re-examining Parameter-Efficient Fine-Tuning from the Stability-Plasticity Perspective

PEFT参数高效微调LoRAOFT大语言模型稳定性-可塑性模型遗忘正交微调LLM微调迁移学习

Published 2026-05-28 01:59Recent activity 2026-05-29 10:52Estimated read 8 min

Section 01

PEFT-Arena: Re-examining Parameter-Efficient Fine-Tuning from the Stability-Plasticity Perspective

The Chinese University of Hong Kong, Westlake University, and the Max Planck Institute jointly proposed the PEFT-Arena benchmark, which for the first time systematically evaluates the trade-off between target task adaptation and pre-trained capability retention of parameter-efficient fine-tuning (PEFT) methods, revealing the advantages of Orthogonal Fine-Tuning (OFT) at the stability-plasticity frontier. This study fills the gap in the current PEFT evaluation paradigm that ignores pre-trained capability retention, providing a new perspective for understanding PEFT methods.

Section 02

Background and Motivation: Blind Spots in PEFT Evaluation and the Stability-Plasticity Dilemma

Parameter-efficient fine-tuning (PEFT) has become a de facto standard in the large language model field (e.g., LoRA, Adapter, Prompt Tuning), promising to adapt to downstream tasks with minimal computational overhead. However, current evaluations only focus on target task accuracy and ignore pre-trained capability retention—models may forget general abilities (such as instruction following and commonsense reasoning) when adapting to new tasks. This is exactly the "stability-plasticity dilemma" in cognitive science: plasticity refers to the ability to learn new domains, while stability refers to the degree of retaining pre-trained capabilities.

Section 03

PEFT-Arena Benchmark Design

PEFT-Arena is the first comprehensive benchmark that simultaneously evaluates target task performance and general capability retention, proposed by teams from The Chinese University of Hong Kong, Westlake University, and the Max Planck Institute for Intelligent Systems. The benchmark covers:

Model families: Qwen2.5-7B, Llama3.2-3B-Instruct
Training paradigms: Supervised Fine-Tuning (SFT), GRPO-based Reinforcement Learning (RLVR)
Task domains: Target tasks (mathematical reasoning, medical QA); General capabilities (IFEval instruction following, NQ natural QA, BBH benchmark) Each configuration reports target accuracy and average score of general capabilities.

Section 04

Key Findings: Stability-Plasticity Performance of PEFT Methods

Experiments reveal key phenomena:

Full Fine-Tuning Cost: Target task performance improves but general capabilities plummet (e.g., Qwen math SFT: target accuracy from 35.30% →50.63%, general capabilities from 46.97%→34.22%).
OFT Advantage: With comparable parameter counts, OFT maintains similar target performance while having minimal loss in general capabilities (OFT-block32 in Qwen math SFT: target 46.93%, general capabilities drop by only 2.6 percentage points).
Catastrophic Failure of PiSSA: In some configurations, target performance does not improve but general capabilities are severely damaged (PiSSA in Llama math SFT: general capabilities from53.03%→9.74%).
RLVR vs. SFT Differences: RLVR maintains relatively intact general capabilities while improving target performance.

Section 05

Mechanism Analysis from a Geometric Perspective

The differences between PEFT methods are explained from two geometric perspectives:

Weight Space Structure: OFT updates via orthogonal subspaces, avoiding interference with key directions of pre-trained knowledge; low-rank methods may introduce destructive perturbations in key singular vector directions.
Activation Space Stability: The "Capability-Conditioned Drift" metric is introduced to measure representation changes, and it is found that the degree of forgetting is closely related to non-isometric representation distortion—general capabilities are most severely lost when the geometric structure of the activation space is distorted.

Section 06

Path Backtracking Strategy: Finding a Better Operating Point

The study found that the final SFT checkpoint often "overshoots" the optimal trade-off point. By interpolating along the fine-tuning path, intermediate models with a better balance between target tasks and general capabilities can be found. Based on this, the "Path Backtracking" strategy is proposed: instead of using the final model, find a Pareto-optimal checkpoint in the optimization trajectory. This strategy does not increase training costs and significantly improves the comprehensive performance of the model.

Section 07

Practical Implications and Future Directions

Implications of the study for AI practice:

Evaluating PEFT needs to focus on both target performance and general capability retention; a single metric is easily misleading.
OFT's advantage in the stability-plasticity trade-off makes it a preferred choice for resource-constrained scenarios.
The path backtracking strategy provides a plug-and-play improvement for existing fine-tuning processes, enhancing the reuse efficiency of foundation models. The team has open-sourced the code and benchmark data, and will further explore the theoretical basis of PEFT and topics related to model reliability and safety in the future.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15