Section 01
SETA: A Mixture of Sparse Experts Architecture to Solve the Dilemma of Continual Learning in Large Models
This article introduces the SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning) framework, which resolves the conflict between plasticity and stability in the continual learning of large language models through adaptive sparse subspace decomposition and expert routing mechanisms, preventing catastrophic forgetting while learning new knowledge. The framework divides the parameter space into unique experts (task-specific) and shared experts (cross-task general), combined with a dynamic routing mechanism to achieve efficient continual learning.