Zing Forum

Reading

CosmicFish-HRM: A Hierarchical Recursive Language Model with Dynamically Adjustable Inference Depth

Introduces the architectural design of CosmicFish-HRM, including how its Hierarchical Recursive Module (HRM) dynamically allocates computing resources during inference to achieve an efficient and adaptive reasoning process.

自适应计算动态推理层级递归语言模型Transformer边缘部署计算效率HRM
Published 2026-05-30 00:10Recent activity 2026-05-30 00:18Estimated read 6 min
CosmicFish-HRM: A Hierarchical Recursive Language Model with Dynamically Adjustable Inference Depth
1

Section 01

CosmicFish-HRM: Guide to the Hierarchical Recursive Language Model with Dynamically Adjustable Inference Depth

CosmicFish-HRM is a language model developed by the Mistyoz AI team in Hyderabad, India. Its core innovation is the Hierarchical Recursive Module (HRM), which dynamically allocates computing resources during inference. This solves the resource waste problem caused by the "one-size-fits-all" inference approach of traditional large language models, supports adaptive computing, and is suitable for edge deployment. The code is open-sourced under the Apache-2.0 license and includes complete training and inference workflows.

2

Section 02

Background: Challenges in LLM Inference Efficiency and the Direction of Adaptive Computing

Current large language models use the same computing resources regardless of input complexity, leading to resource waste and limiting deployment in resource-constrained environments. Adaptive computing aims to enable models to determine the necessary inference steps, and CosmicFish-HRM is a concrete implementation of this research direction.

3

Section 03

Project Overview: Core Components and Open-Source Information

  • Development team: Mistyoz AI (Hyderabad, India)
  • Core component: Hierarchical Recursive Module (HRM), enabling dynamic adjustment of inference depth
  • Open-source license: Apache-2.0
  • Source platform: GitHub, release date May 29, 2026
  • Code coverage: Complete workflows for data preparation, training, fine-tuning, and quantization
4

Section 04

Core Architecture: Dynamic Inference Mechanism and Technical Details

Overall Structure

Input Transformer block → HRM core → Output Transformer block → Language model head

HRM Module

  • H-level: Macro semantic understanding and cross-token relationship modeling
  • L-level: Fine-grained feature extraction and local pattern recognition

Dynamic Stopping Mechanism

Evaluates hidden states via a halt/continue Q-head to decide whether to continue reasoning or output. Simple tokens require only 1-2 steps, while complex tokens take up to 16 steps.

Technical Configuration

Vocabulary size: 50304, embedding dimension: 448, context length: 512, 6 input/output layers each, 4 HRM H/L layers each, GQA attention heads, RoPE positional encoding, etc.

5

Section 05

Training Workflow: Data Composition and Multi-Stage Strategy

Data Composition

Approximately 10 billion tokens, sourced from FineWeb (3B), Wikipedia (3B), OpenWebText (1B), C4 (1B), CodeParrot (1B), OpenWebMath (500M), and ArXiv (500M).

Training Stages

  1. Pre-training: Basic language modeling on 10 billion tokens
  2. Dialogue fine-tuning: Adjust to dialogue format
  3. Instruction fine-tuning: Train on the Alpaca-GPT4 dataset
  4. Identity calibration: Stabilize personality using mixed identity and dialogue data

Optimization Strategy

Training loss includes a step penalty to encourage efficient reasoning.

6

Section 06

Inference and Usage: Interactive Features and Parameter Adjustment

  • Interactive interface: chat.py supports parameter adjustment for temperature, max_tokens, top_k, etc.
  • Key options: --show_hrm_steps to display inference steps, --force_hrm_steps for manual override
  • Supported commands: /temp (adjust temperature), /tokens (max generation count), /hrm (adjust steps), etc.
7

Section 07

Practical Significance, Limitations, and Summary Recommendations

Significance and Advantages

  • Explores dynamic computing allocation during inference, complementing pre-training optimization ideas
  • Edge deployment-friendly, predictable response speed, high resource utilization

Limitations

Context length is only 512, and the model size is relatively small (around 400M parameters)

Summary Recommendations

The model achieves adaptive reasoning via HRM, inspiring the construction of efficient language models; it is recommended that developers researching adaptive computing study the open-source code of this project.