# CosmicFish-HRM: A Hierarchical Recursive Language Model with Dynamically Adjustable Inference Depth

> Introduces the architectural design of CosmicFish-HRM, including how its Hierarchical Recursive Module (HRM) dynamically allocates computing resources during inference to achieve an efficient and adaptive reasoning process.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-29T16:10:48.000Z
- 最近活动: 2026-05-29T16:18:08.352Z
- 热度: 150.9
- 关键词: 自适应计算, 动态推理, 层级递归, 语言模型, Transformer, 边缘部署, 计算效率, HRM
- 页面链接: https://www.zingnex.cn/en/forum/thread/cosmicfish-hrm
- Canonical: https://www.zingnex.cn/forum/thread/cosmicfish-hrm
- Markdown 来源: floors_fallback

---

## CosmicFish-HRM: Guide to the Hierarchical Recursive Language Model with Dynamically Adjustable Inference Depth

CosmicFish-HRM is a language model developed by the Mistyoz AI team in Hyderabad, India. Its core innovation is the Hierarchical Recursive Module (HRM), which dynamically allocates computing resources during inference. This solves the resource waste problem caused by the "one-size-fits-all" inference approach of traditional large language models, supports adaptive computing, and is suitable for edge deployment. The code is open-sourced under the Apache-2.0 license and includes complete training and inference workflows.

## Background: Challenges in LLM Inference Efficiency and the Direction of Adaptive Computing

Current large language models use the same computing resources regardless of input complexity, leading to resource waste and limiting deployment in resource-constrained environments. Adaptive computing aims to enable models to determine the necessary inference steps, and CosmicFish-HRM is a concrete implementation of this research direction.

## Project Overview: Core Components and Open-Source Information

- Development team: Mistyoz AI (Hyderabad, India)
- Core component: Hierarchical Recursive Module (HRM), enabling dynamic adjustment of inference depth
- Open-source license: Apache-2.0
- Source platform: GitHub, release date May 29, 2026
- Code coverage: Complete workflows for data preparation, training, fine-tuning, and quantization

## Core Architecture: Dynamic Inference Mechanism and Technical Details

### Overall Structure
Input Transformer block → HRM core → Output Transformer block → Language model head

### HRM Module
- H-level: Macro semantic understanding and cross-token relationship modeling
- L-level: Fine-grained feature extraction and local pattern recognition

### Dynamic Stopping Mechanism
Evaluates hidden states via a halt/continue Q-head to decide whether to continue reasoning or output. Simple tokens require only 1-2 steps, while complex tokens take up to 16 steps.

### Technical Configuration
Vocabulary size: 50304, embedding dimension: 448, context length: 512, 6 input/output layers each, 4 HRM H/L layers each, GQA attention heads, RoPE positional encoding, etc.

## Training Workflow: Data Composition and Multi-Stage Strategy

### Data Composition
Approximately 10 billion tokens, sourced from FineWeb (3B), Wikipedia (3B), OpenWebText (1B), C4 (1B), CodeParrot (1B), OpenWebMath (500M), and ArXiv (500M).

### Training Stages
1. Pre-training: Basic language modeling on 10 billion tokens
2. Dialogue fine-tuning: Adjust to dialogue format
3. Instruction fine-tuning: Train on the Alpaca-GPT4 dataset
4. Identity calibration: Stabilize personality using mixed identity and dialogue data

### Optimization Strategy
Training loss includes a step penalty to encourage efficient reasoning.

## Inference and Usage: Interactive Features and Parameter Adjustment

- Interactive interface: chat.py supports parameter adjustment for temperature, max_tokens, top_k, etc.
- Key options: --show_hrm_steps to display inference steps, --force_hrm_steps for manual override
- Supported commands: /temp (adjust temperature), /tokens (max generation count), /hrm (adjust steps), etc.

## Practical Significance, Limitations, and Summary Recommendations

### Significance and Advantages
- Explores dynamic computing allocation during inference, complementing pre-training optimization ideas
- Edge deployment-friendly, predictable response speed, high resource utilization

### Limitations
Context length is only 512, and the model size is relatively small (around 400M parameters)

### Summary Recommendations
The model achieves adaptive reasoning via HRM, inspiring the construction of efficient language models; it is recommended that developers researching adaptive computing study the open-source code of this project.