Section 01
YOCO-U: Introduction to the New Transformer Architecture for Efficient Depth Expansion
YOCO-U combines the YOCO decoder architecture with recursive computation. Through a parameter-shared universal self-decoder and shallow efficient attention layers, it achieves depth expansion while maintaining constant KV cache and linear pre-filling. It solves the computational overhead and KV cache inflation problems of standard Transformers during inference, providing a new direction for efficient inference-time computation expansion.