# Causal Reasoning Action Model: An Agent Planning Method Based Purely on Causal Intervention Without Imitation Learning

> This article introduces an innovative proof-of-concept project that proposes an agent architecture based on causal reasoning. Through a "do-intervention" verification mechanism, the architecture allows LLMs to propose action plans, the agent tests and verifies them in a world model, uses a memory system to store Q-values, and finally achieves fast and reliable cross-domain planning in a pure CPU environment.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-21T15:34:53.000Z
- 最近活动: 2026-04-21T15:50:21.373Z
- 热度: 163.7
- 关键词: 因果推理, 智能体, 大语言模型, do-干预, 模仿学习, 强化学习, 世界模型, Q值学习, 规划算法, 因果推断
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-github-farmountain-large-reasoning-action-model-whitepaper-poc
- Canonical: https://www.zingnex.cn/forum/thread/llm-github-farmountain-large-reasoning-action-model-whitepaper-poc
- Markdown 来源: floors_fallback

---

## Causal Reasoning Action Model: A New Paradigm for Agent Planning Without Imitation Learning

This article introduces the innovative proof-of-concept project of the Large Reasoning Action Model (LRAM), which proposes an agent architecture based on causal reasoning. Abandoning imitation learning, this architecture enables fast and reliable cross-domain planning in a pure CPU environment through three steps: LLMs propose action plans, causal agents perform do-intervention verification in the world model, and the memory system stores Q-values. Its core is a decision-making paradigm based on causal understanding rather than replication of historical patterns.

## Background: Limitations of Imitation Learning and the Necessity of Causal Reasoning

Most current LLM agents rely on imitation learning to replicate observed behavioral patterns, but they struggle to handle novel scenarios and easily inherit data biases. The LRAM project shifts to a pure causal reasoning mechanism, arguing that true intelligent decision-making should be based on causal understanding of action consequences rather than simple replication of historical patterns.

## System Architecture: A Closed-Loop Causal Agent with Three Collaborative Layers

The LRAM architecture integrates three key components to form a decision-making closed loop: 
1. **LLM as the proposer**: A general large model generates candidate action plans; 
2. **Causal agent as the verifier**: Sends LLM suggestions to the world model for do-intervention verification; 
3. **Memory system as the value storage**: Encodes verification results into Q-values for storage and builds a causal association graph between actions and outcomes.

## Core Mechanism: The Principle of Causal Verification via Do-Intervention

Do-intervention is the core that distinguishes LRAM from traditional methods: After an LLM proposes an action, the agent constructs a hypothetical scenario (consequences of executing the action) in its internal world model, evaluates the expected return through multiple experiments, and selects effective actions. This mechanism can verify the causal effects of action sequences, capture causal structures that are hard to find with pure statistical methods, and is safe and efficient without the need for real environment interaction.

## Memory System: Storage and Reuse of Q-Values After Causal Verification

The memory system stores Q-values (expected return of executing action A in state S) that have undergone causal verification, with three key advantages: 
- **Interpretability**: Q-values correspond to clear causal verification history; 
- **Updatability**: Relevant memories can be re-verified specifically when the world model is updated; 
- **Transferability**: Abstract causal structures can be reused across domains to accelerate learning in new domains.

## Performance Evidence: Cross-Domain Convergence in a Pure CPU Environment

LRAM achieves fast and reliable planning convergence in four different domains in a pure CPU environment. Its cross-domain generalization ability stems from the domain-agnostic nature of the causal mechanism—only the domain definition of the world model needs to be changed, and the causal verification engine can be reused; whereas imitation learning requires collecting specialized data and retraining for each domain.

## Comparative Analysis: Differences Between LRAM and Mainstream Methods

Comparison with existing methods: 
- **Traditional reinforcement learning**: Low sample efficiency, requiring a large number of environment interactions; LRAM reduces real interactions through LLM priors and causal verification; 
- **Imitation learning**: Relies on expert data, limited by coverage; LRAM discovers strategies autonomously; 
- **LLM-based agents (e.g., ReAct)**: Lack systematic verification, prone to hallucinations; LRAM ensures decisions are based on real causality through the causal verification layer.

## Future Outlook: Directions for Causal Reasoning and General Intelligence

Methodological insights from LRAM: 
- Causal understanding is the foundation of AI reliability; 
- Collaboration between LLMs and causal reasoning breaks through limitations; 
- Causal meta-learning is key to general intelligence. In the future, we can expect more complex causal reasoning capabilities, efficient verification algorithms, and application in real-world scenarios.
