Zing Forum

Reading

DUME: A New MoE Method for Dynamic Expert Model Recombination Without Training

DUME achieves dynamic combination of expert models without additional training via the closed-form solution of ridge regression. It maintains 97.6% of the original experts' performance while supporting dynamic addition of new experts, solving the problem of multi-domain expert integration.

混合专家模型模型整合岭回归领域专家多任务学习无需训练动态扩展
Published 2026-03-31 22:05Recent activity 2026-04-01 09:20Estimated read 6 min
DUME: A New MoE Method for Dynamic Expert Model Recombination Without Training
1

Section 01

DUME: Guide to the New MoE Method for Dynamic Expert Model Recombination Without Training

Core Guide to DUME

DUME (Dynamic Upcycling MoE) is a new MoE method that dynamically recombines multi-domain expert models without additional training. It achieves expert integration via the closed-form solution of ridge regression, maintaining 97.6% of the original experts' performance while supporting dynamic addition of new experts, solving the cost and efficiency challenges of multi-domain expert integration.

This article will discuss aspects such as background, technical solution, performance verification, dynamic expansion, and application prospects.

2

Section 02

Specialization Dilemma of Large Models and Limitations of MoE Architecture

Background: Challenges of Large Models and MoE

Specialization Dilemma of Large Models

  • Over-specialization: Domain-finetuned models lose general capabilities
  • Difficulty in multi-domain integration: Inter-task interference and catastrophic forgetting
  • High cost: Huge resource consumption for separate training + integration

Limitations of Traditional MoE

Although MoE architecture can combine experts, existing methods still require multi-task fine-tuning to coordinate experts, making it impossible to achieve "plug-and-play" for pre-trained domain experts.

3

Section 03

Core Solution of DUME: Expert Recombination Without Training

DUME Solution: Dynamically Upgraded Expert Integration

The core innovation of DUME lies in completely no need for additional training to recombine multiple domain expert models:

  • Use closed-form solution of ridge regression to directly calculate optimal integration parameters, skipping iterative training
  • Advantages: Second-level computation efficiency, dynamic expansion capability, mathematically optimal stability

This method retains the original expert weights, fundamentally avoiding catastrophic forgetting.

4

Section 04

Technical Principle: Ridge Regression and Expert Routing Design

Technical Principle: Ridge Regression-Driven Gating Mechanism

DUME transforms the calculation of gating parameters into a ridge regression problem:

  1. Treat each expert's output as a feature
  2. Goal: Find weighted combination weights to make the output approximate the ideal target
  3. Directly obtain optimal weights via the closed-form solution of linear regression with L2 regularization (ridge regression)

This design converts "learning" into "computation", increasing speed by several orders of magnitude.

5

Section 05

Performance Evaluation: Maintaining and Surpassing Original Expert Capabilities

Performance Verification: Excellent Integration Effect

  • Causal Language Modeling: Retains 97.6% of the original experts' domain performance
  • Reasoning Tasks: Achieves 102.1% performance surpass (complementary effect)
  • Comparison with Baselines: Consistently outperforms existing model integration methods, and the integration process is completed in seconds

This verifies DUME's dual advantages in performance and efficiency.

6

Section 06

Dynamic Expansion: Supporting Incremental Expert Integration

Dynamic Expansion and Continuous Learning

DUME supports adding new experts at any time:

  • When adding a new domain expert, only need to recalculate the closed-form solution without retraining
  • The integrated model still supports subsequent fine-tuning to adapt to specific scenarios

It is suitable for enterprises to gradually build expert libraries and realize the continuous evolution of knowledge systems.

7

Section 07

Application Prospects and Open Source Value

Application Prospects and Open Source Contributions

  • Lowering Threshold: Teams with limited resources can also build multi-domain expert systems
  • Enterprise Applications: Supports rapid deployment and incremental expansion
  • Open Source Code: Released at github.com/gensyn-ai/dume, which can explore scenarios such as multilingual, multimodal, and federated learning

It provides a practical and efficient solution for the field of model integration.