Zing Forum

Reading

AdaSR: Adaptive Streaming Reasoning Framework and Hierarchical Relative Policy Optimization

This article introduces AdaSR, an adaptive framework that enables large models to perform reasoning during input streaming. It achieves hierarchical reasoning optimization through HRPO technology, striking a better balance between reasoning accuracy, computational efficiency, and streaming latency.

流式推理自适应推理强化学习RLVRHRPO分层优化实时AI计算效率
Published 2026-06-13 01:56Recent activity 2026-06-15 11:51Estimated read 6 min
AdaSR: Adaptive Streaming Reasoning Framework and Hierarchical Relative Policy Optimization
1

Section 01

【Introduction】AdaSR: Adaptive Streaming Reasoning Framework and HRPO Hierarchical Optimization Technology

This article introduces the AdaSR framework proposed in the arXiv paper (2606.14694v1), which aims to address the limitations of the traditional "read-first-then-think" reasoning paradigm in dynamic scenarios (e.g., audio streams, real-time sensor data). Through a hierarchical reasoning architecture (streaming + deep stages) and the HRPO (Hierarchical Relative Policy Optimization) algorithm, this framework achieves adaptive computation allocation, striking a better balance between reasoning accuracy, computational efficiency, and streaming latency.

2

Section 02

【Background】Limitations of Traditional Reasoning and Challenges of Streaming Reasoning

Traditional reasoning follows the "read-first-then-think" paradigm, which is only suitable for static inputs and cannot meet the needs of continuous information inflow in dynamic scenarios. Streaming reasoning needs to satisfy requirements such as real-time response, decision-making based on partial observations, dynamic resource allocation, and latency-accuracy trade-off. However, existing methods rely on supervised imitation learning with pre-constructed trajectories, which have problems like insufficient flexibility, poor adaptability, and coarse optimization granularity.

3

Section 03

【Methodology】Design of AdaSR Hierarchical Reasoning Framework

The AdaSR framework consists of two stages: 1. Streaming reasoning stage: Perform incremental updates when input arrives continuously, with lightweight computation and maintenance of internal state; 2. Deep reasoning stage: Conduct global optimization and final deliberation based on complete information after input is finished. In addition, the framework introduces an adaptive computation allocation mechanism to dynamically allocate resources according to input characteristics and task complexity.

4

Section 04

【Methodology】HRPO Hierarchical Relative Policy Optimization Algorithm

HRPO is an extension of GRPO, designed for hierarchical reasoning scenarios: 1. Fine-grained advantage allocation: Divide optimization into streaming and deep stages, assign different advantage values to each stage to achieve stage-specific optimization; 2. Multi-dimensional rewards: Including format rewards (to standardize reasoning protocols), accuracy rewards (to ensure final performance), and adaptive thinking rewards (to encourage latency-aware computation allocation).

5

Section 05

【Evidence】Experimental Performance Analysis of AdaSR

Experiments show that AdaSR outperforms supervised fine-tuning baselines:

  1. Accuracy: Incremental reasoning and two-stage collaboration improve benchmark performance;
  2. Computational efficiency: Adaptive allocation avoids one-size-fits-all patterns and saves resources;
  3. Streaming latency: Fast first-token response, smooth incremental updates, and high-quality final answers.
6

Section 06

【Applications】Practical Scenario Value of AdaSR

AdaSR is applicable to multiple scenarios:

  1. Real-time audio and video understanding (video conferences, live stream analysis, etc.);
  2. Interactive AI assistants (real-time understanding of user input, natural conversation rhythm);
  3. Sensor data processing (real-time perception and decision-making in IoT and autonomous driving).
7

Section 07

【Summary and Outlook】Contributions and Future Directions of AdaSR

The contributions of AdaSR include a hierarchical reasoning paradigm, an adaptive optimization mechanism, and a fine-grained RLVR method, with open-source code and a universal framework. Future directions can explore more hierarchical architectures, token-level computation control, multi-modal extensions, and hardware co-optimization to promote the development of real-time AI reasoning.