Section 01
[Introduction] ICLR 2026 Oral Research Reveals Optimal Sparsity Solution for MoE and Capability Scaling Laws
The research on optimizing sparsity in Mixture-of-Experts (MoE) models by the joint team from Tokyo Institute of Technology and RIKEN has been accepted as an Oral paper at ICLR 2026. This study proposes that reasoning and memory capabilities in MoE follow distinct scaling laws, and fully open-sources 65 pre-trained checkpoints and related code, providing a new paradigm for MoE architecture design.