Section 01
Orthrus: Introduction to the LLM Lossless Parallel Inference Acceleration Framework via Dual-View Diffusion
This article introduces the Orthrus framework, which combines the precise generation of autoregressive LLMs with the parallel capability of diffusion models to achieve up to 7.8x inference acceleration while maintaining strictly lossless output quality. Its core is a dual-view diffusion architecture based on the Qwen3 backbone network, supporting the MLX framework and Apple Silicon with zero redundant memory overhead.