Section 01
Core Introduction to the Orthrus Framework
Orthrus is an innovative dual-view diffusion decoding framework for large language model (LLM) inference. It combines the precise generation quality of autoregressive models with the high-speed parallel decoding capability of diffusion models, achieving up to 7.8x inference acceleration while maintaining completely lossless output. Built on the Qwen3 series models, it adopts a parameter-efficient fine-tuning strategy with negligible memory overhead, providing a new path for optimizing LLM inference efficiency.