# Subgoal Persistence in Hierarchical Latent Reasoning: When Should We Replan?

> This paper investigates the trade-off of subgoal duration in hierarchical latent reasoning models, finding that a moderate persistence period (P=3-6 steps) is optimal—both too short or too long periods lead to performance degradation, providing important guiding principles for the design of combinatorial planning systems.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-02T14:55:47.000Z
- 最近活动: 2026-06-03T05:54:15.814Z
- 热度: 132.0
- 关键词: 隐式推理, 分层推理, 子目标规划, ARC基准, 组合规划, 长程推理
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2606-03741v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2606-03741v1
- Markdown 来源: floors_fallback

---

## Introduction: Core Findings on Subgoal Persistence in Hierarchical Latent Reasoning

This paper is from arXiv (published in June 2026, original title: *When to Re-Plan: Subgoal Persistence in Hierarchical Latent Reasoning*), focusing on the trade-off of subgoal duration in hierarchical latent reasoning models. Experiments show that a moderate persistence period (P=3-6 steps) is the optimal choice—both too short or too long periods lead to performance degradation, providing important guiding principles for the design of combinatorial planning systems.

## Research Background: Stability-Adaptability Dilemma in Long-Range Reasoning

Long-range reasoning requires agents to maintain goal consistency while flexibly adjusting strategies, presenting a stability-adaptability trade-off: frequent replanning leads to short-sightedness, while overly long commitment periods become outdated. Traditional explicit chain-of-thought has issues like high token consumption; latent reasoning transfers multi-step computations to hidden states, offering a new direction for long-range reasoning.

## Model Architecture: Manager-Worker Mechanism in Hierarchical Latent Reasoning

Extended based on the Hierarchical Reasoning Model (HRM), it uses a manager-worker interface: the manager generates directional subgoals at low frequency, while the worker executes subgoal-guided reasoning steps at high frequency. The subgoal persistence mechanism uses hidden state bias and intrinsic alignment loss to keep subgoals effective for P steps.

## Key Findings: Moderate Subgoal Period (P=3-6) is Optimal

In ARC benchmark experiments, P=3 achieves the best performance (loss=1.544), and the range P=3-6 outperforms P=1 (overly frequent) and long periods (rigid); the optimal weight for intrinsic alignment loss λ≈0.05—too small fails to guide, too large disrupts effective structures.

## Ablation Experiments: Over-Alignment Disrupts Learned Structures

When fixing λ to its optimal value, experiments show that the interference from over-alignment comes from the model's learned directional structures, not from the architecture's capacity or the auxiliary loss itself—indicating that the balance between moderate guidance and autonomous learning is crucial.

## Design Principles and Practical Implications

Core Principle: Intentions with moderate time spans need to remain consistent for enough steps to form combinatorial structures. Implications: Architects should choose a subgoal period of 3-6 steps; training needs to tune alignment weights; evaluation should use ARC-like abstract reasoning tasks and repeat multiple sub-experiments.

## Limitations and Future Research Directions

Limitations: Experiments are focused on the ARC benchmark, use fixed P values, and the latent reasoning mechanism lacks transparency. Future Directions: Generalize to tasks like code generation, explore adaptive P value mechanisms, and build hybrid systems combining explicit and latent reasoning.
