Zing Forum

Reading

YOCO-U: A New Transformer Architecture for Efficient Depth Expansion via Recursive Computation

YOCO-U combines the YOCO decoder architecture with recursive computation. Through a parameter-shared universal self-decoder and shallow efficient attention layers, it achieves depth expansion while maintaining constant KV cache and linear pre-filling, providing a new direction for efficient inference-time computation expansion.

YOCO架构递归计算TransformerKV缓存优化测试时扩展高效推理深度扩展
Published 2026-04-02 01:58Recent activity 2026-04-02 10:49Estimated read 7 min
YOCO-U: A New Transformer Architecture for Efficient Depth Expansion via Recursive Computation
1

Section 01

YOCO-U: Introduction to the New Transformer Architecture for Efficient Depth Expansion

YOCO-U combines the YOCO decoder architecture with recursive computation. Through a parameter-shared universal self-decoder and shallow efficient attention layers, it achieves depth expansion while maintaining constant KV cache and linear pre-filling. It solves the computational overhead and KV cache inflation problems of standard Transformers during inference, providing a new direction for efficient inference-time computation expansion.

2

Section 02

Dilemmas of Inference-Time Expansion and Background of Existing Technologies

Rise and Dilemmas of Inference-Time Computation

In recent years, test-time expansion techniques have improved the inference capabilities of large language models, but standard Transformers face bottlenecks of high computational overhead (recalculating attention in each iteration) and KV cache inflation (growing linearly with depth), leading to high costs for test-time expansion.

Advantages of the YOCO Architecture

The YOCO architecture adopts a decoder-decoder structure. It achieves constant cache size by sharing global KV cache through shallow efficient attention layers, and its pre-filling complexity is linear, making it more efficient for processing long sequences.

Potential and Limitations of Recursive Computation

Recursive computation can enhance representation depth, but when used alone, it has problems of high computational overhead and cache inflation. It needs to be combined with efficient cache management to achieve synergistic effects.

3

Section 03

YOCO-U Architecture Design and Technical Details

Core Design of YOCO-U

YOCO-U combines YOCO with recursive computation. Its core is the universal self-decoder: it performs multiple iterations on shallow efficient attention layers through parameter sharing. The deep standard decoder is responsible for extracting semantics, while the shallow layer recursively refines representations, maintaining a constant KV cache.

Key Technical Details

  1. Recursive Position Selection: Restricted to shallow layers to handle local patterns and low-level features;
  2. Parameter Sharing: Keeps the number of parameters unchanged and learns a universal refinement strategy;
  3. Adaptive Termination: Determines the recursive depth based on input complexity.
4

Section 04

Experimental Verification Results of YOCO-U

General Benchmark Tests

Compared with non-recursive YOCO models of the same scale, YOCO-U shows significant improvements in multiple tasks (especially multi-step reasoning), with limited increase in inference latency.

Long Context Tests

The constant KV cache can handle long documents of tens of thousands of tokens. The recursive mechanism better captures long-distance dependencies, leading to excellent performance in document-level understanding tasks.

Expansion Behavior

As the recursive depth increases, the model's capabilities continue to improve, and the computational cost grows gently—superior to the linear/superlinear growth of standard Transformers.

5

Section 05

Architectural Insights and Application Prospects of YOCO-U

Architectural Design Insights

  1. Multi-dimensional Collaboration: Combine complementary points of different technologies;
  2. Fine-grained Resource Allocation: Shallow and deep layers take on different roles;
  3. Smart Computation: Optimize test-time expansion through architectural innovation.

Application Prospects

Suitable for long document processing (legal analysis, medical reviews), deep reasoning (mathematical proofs, code debugging), and resource-constrained environments (edge devices, real-time systems). It can dynamically adjust recursive depth to balance quality and speed.

Conclusion

YOCO-U is an important milestone in the evolution of Transformers. Through architectural innovation, it achieves depth expansion without sacrificing efficiency, providing a sustainable path for test-time expansion.

6

Section 06

Limitations of YOCO-U and Future Research Directions

Current Limitations

  • The adaptive recursive depth still needs optimization;
  • More research is needed on the adaptation of the recursive mechanism to specific tasks.

Future Directions

  • Explore complex structures such as hierarchical/conditional recursion;
  • Combine efficiency technologies like sparse attention and quantization;
  • Apply to multi-modal models.