# AdaSR: Adaptive Streaming Reasoning Framework and Hierarchical Relative Policy Optimization

> This article introduces AdaSR, an adaptive framework that enables large models to perform reasoning during input streaming. It achieves hierarchical reasoning optimization through HRPO technology, striking a better balance between reasoning accuracy, computational efficiency, and streaming latency.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-12T17:56:25.000Z
- 最近活动: 2026-06-15T03:51:38.900Z
- 热度: 93.1
- 关键词: 流式推理, 自适应推理, 强化学习, RLVR, HRPO, 分层优化, 实时AI, 计算效率
- 页面链接: https://www.zingnex.cn/en/forum/thread/adasr
- Canonical: https://www.zingnex.cn/forum/thread/adasr
- Markdown 来源: floors_fallback

---

## 【Introduction】AdaSR: Adaptive Streaming Reasoning Framework and HRPO Hierarchical Optimization Technology

This article introduces the AdaSR framework proposed in the arXiv paper (2606.14694v1), which aims to address the limitations of the traditional "read-first-then-think" reasoning paradigm in dynamic scenarios (e.g., audio streams, real-time sensor data). Through a hierarchical reasoning architecture (streaming + deep stages) and the HRPO (Hierarchical Relative Policy Optimization) algorithm, this framework achieves adaptive computation allocation, striking a better balance between reasoning accuracy, computational efficiency, and streaming latency.

## 【Background】Limitations of Traditional Reasoning and Challenges of Streaming Reasoning

Traditional reasoning follows the "read-first-then-think" paradigm, which is only suitable for static inputs and cannot meet the needs of continuous information inflow in dynamic scenarios. Streaming reasoning needs to satisfy requirements such as real-time response, decision-making based on partial observations, dynamic resource allocation, and latency-accuracy trade-off. However, existing methods rely on supervised imitation learning with pre-constructed trajectories, which have problems like insufficient flexibility, poor adaptability, and coarse optimization granularity.

## 【Methodology】Design of AdaSR Hierarchical Reasoning Framework

The AdaSR framework consists of two stages: 1. Streaming reasoning stage: Perform incremental updates when input arrives continuously, with lightweight computation and maintenance of internal state; 2. Deep reasoning stage: Conduct global optimization and final deliberation based on complete information after input is finished. In addition, the framework introduces an adaptive computation allocation mechanism to dynamically allocate resources according to input characteristics and task complexity.

## 【Methodology】HRPO Hierarchical Relative Policy Optimization Algorithm

HRPO is an extension of GRPO, designed for hierarchical reasoning scenarios: 1. Fine-grained advantage allocation: Divide optimization into streaming and deep stages, assign different advantage values to each stage to achieve stage-specific optimization; 
2. Multi-dimensional rewards: Including format rewards (to standardize reasoning protocols), accuracy rewards (to ensure final performance), and adaptive thinking rewards (to encourage latency-aware computation allocation).

## 【Evidence】Experimental Performance Analysis of AdaSR

Experiments show that AdaSR outperforms supervised fine-tuning baselines: 
1. Accuracy: Incremental reasoning and two-stage collaboration improve benchmark performance; 
2. Computational efficiency: Adaptive allocation avoids one-size-fits-all patterns and saves resources; 
3. Streaming latency: Fast first-token response, smooth incremental updates, and high-quality final answers.

## 【Applications】Practical Scenario Value of AdaSR

AdaSR is applicable to multiple scenarios: 
1. Real-time audio and video understanding (video conferences, live stream analysis, etc.); 
2. Interactive AI assistants (real-time understanding of user input, natural conversation rhythm); 
3. Sensor data processing (real-time perception and decision-making in IoT and autonomous driving).

## 【Summary and Outlook】Contributions and Future Directions of AdaSR

The contributions of AdaSR include a hierarchical reasoning paradigm, an adaptive optimization mechanism, and a fine-grained RLVR method, with open-source code and a universal framework. Future directions can explore more hierarchical architectures, token-level computation control, multi-modal extensions, and hardware co-optimization to promote the development of real-time AI reasoning.
