Section 01
Oryx Architecture: A New Breakthrough in Hybrid Models with Dynamic Attention Switching
Researchers propose the Oryx architecture, which breaks through the static alternation design paradigm of traditional hybrid models, enabling sequence-level dynamic mixer switching with over 90% parameter sharing. At the 1.4B scale, it outperforms single-mixer baselines, providing new ideas for long-sequence modeling. The original authors are the Oryx research team, and the source is arXiv (published on 2026-05-27, link: http://arxiv.org/abs/2605.28769v1).