# YOCO-U: A New Transformer Architecture for Efficient Depth Expansion via Recursive Computation

> YOCO-U combines the YOCO decoder architecture with recursive computation. Through a parameter-shared universal self-decoder and shallow efficient attention layers, it achieves depth expansion while maintaining constant KV cache and linear pre-filling, providing a new direction for efficient inference-time computation expansion.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-01T17:58:21.000Z
- 最近活动: 2026-04-02T02:49:22.446Z
- 热度: 131.2
- 关键词: YOCO架构, 递归计算, Transformer, KV缓存优化, 测试时扩展, 高效推理, 深度扩展
- 页面链接: https://www.zingnex.cn/en/forum/thread/yoco-u-transformer
- Canonical: https://www.zingnex.cn/forum/thread/yoco-u-transformer
- Markdown 来源: floors_fallback

---

## YOCO-U: Introduction to the New Transformer Architecture for Efficient Depth Expansion

YOCO-U combines the YOCO decoder architecture with recursive computation. Through a parameter-shared universal self-decoder and shallow efficient attention layers, it achieves depth expansion while maintaining constant KV cache and linear pre-filling. It solves the computational overhead and KV cache inflation problems of standard Transformers during inference, providing a new direction for efficient inference-time computation expansion.

## Dilemmas of Inference-Time Expansion and Background of Existing Technologies

### Rise and Dilemmas of Inference-Time Computation
In recent years, test-time expansion techniques have improved the inference capabilities of large language models, but standard Transformers face bottlenecks of high computational overhead (recalculating attention in each iteration) and KV cache inflation (growing linearly with depth), leading to high costs for test-time expansion.

### Advantages of the YOCO Architecture
The YOCO architecture adopts a decoder-decoder structure. It achieves constant cache size by sharing global KV cache through shallow efficient attention layers, and its pre-filling complexity is linear, making it more efficient for processing long sequences.

### Potential and Limitations of Recursive Computation
Recursive computation can enhance representation depth, but when used alone, it has problems of high computational overhead and cache inflation. It needs to be combined with efficient cache management to achieve synergistic effects.

## YOCO-U Architecture Design and Technical Details

### Core Design of YOCO-U
YOCO-U combines YOCO with recursive computation. Its core is the universal self-decoder: it performs multiple iterations on shallow efficient attention layers through parameter sharing. The deep standard decoder is responsible for extracting semantics, while the shallow layer recursively refines representations, maintaining a constant KV cache.

### Key Technical Details
1. **Recursive Position Selection**: Restricted to shallow layers to handle local patterns and low-level features;
2. **Parameter Sharing**: Keeps the number of parameters unchanged and learns a universal refinement strategy;
3. **Adaptive Termination**: Determines the recursive depth based on input complexity.

## Experimental Verification Results of YOCO-U

### General Benchmark Tests
Compared with non-recursive YOCO models of the same scale, YOCO-U shows significant improvements in multiple tasks (especially multi-step reasoning), with limited increase in inference latency.

### Long Context Tests
The constant KV cache can handle long documents of tens of thousands of tokens. The recursive mechanism better captures long-distance dependencies, leading to excellent performance in document-level understanding tasks.

### Expansion Behavior
As the recursive depth increases, the model's capabilities continue to improve, and the computational cost grows gently—superior to the linear/superlinear growth of standard Transformers.

## Architectural Insights and Application Prospects of YOCO-U

### Architectural Design Insights
1. **Multi-dimensional Collaboration**: Combine complementary points of different technologies;
2. **Fine-grained Resource Allocation**: Shallow and deep layers take on different roles;
3. **Smart Computation**: Optimize test-time expansion through architectural innovation.

### Application Prospects
Suitable for long document processing (legal analysis, medical reviews), deep reasoning (mathematical proofs, code debugging), and resource-constrained environments (edge devices, real-time systems). It can dynamically adjust recursive depth to balance quality and speed.

### Conclusion
YOCO-U is an important milestone in the evolution of Transformers. Through architectural innovation, it achieves depth expansion without sacrificing efficiency, providing a sustainable path for test-time expansion.

## Limitations of YOCO-U and Future Research Directions

### Current Limitations
- The adaptive recursive depth still needs optimization;
- More research is needed on the adaptation of the recursive mechanism to specific tasks.

### Future Directions
- Explore complex structures such as hierarchical/conditional recursion;
- Combine efficiency technologies like sparse attention and quantization;
- Apply to multi-modal models.
