# eMoT: Dynamic Memory-of-Thought Framework Achieves 100% Accuracy on Game of 24, Enabling Strong Reasoning in Lightweight Models

> eMoT uses three core modules—memory corrosion, symbolic anchoring, and consistency refinement—to treat reasoning trajectories as dynamically evolving memories rather than static templates, enabling lightweight models to achieve reasoning performance that surpasses large-scale models.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-01T10:41:12.000Z
- 最近活动: 2026-06-02T03:23:02.877Z
- 热度: 143.3
- 关键词: eMoT, 思维记忆, 神经符号AI, 推理增强, Game of 24, 多步推理, 记忆腐蚀, 符号锚定
- 页面链接: https://www.zingnex.cn/en/forum/thread/emot-game-of-24-100
- Canonical: https://www.zingnex.cn/forum/thread/emot-game-of-24-100
- Markdown 来源: floors_fallback

---

## Introduction: eMoT Framework Enables Strong Reasoning in Lightweight Models, Achieves 100% Accuracy on Game of 24

eMoT (evolving Memory-of-Thought) is a dynamic memory-of-thought framework. Through three core modules—memory corrosion, symbolic anchoring, and consistency refinement—it treats reasoning trajectories as dynamically evolving memories instead of static templates. This framework enables lightweight models to achieve reasoning performance that surpasses large-scale models, especially reaching 100% accuracy on the classic mathematical reasoning task Game of 24.

## Problem Background: Two Core Defects in Large Model Reasoning

Large Language Models (LLMs) have two core defects in multi-step reasoning:
1. **Hallucination Problem**: Intermediate steps easily produce incorrect conclusions and continue to derive from them, and self-correction is difficult;
2. **Weak Numerical Calculation Ability**: Exact arithmetic operations often go wrong, contrasting with humans' habit of using tools.
The root cause is that LLMs treat reasoning as a one-time generation process, unable to retain or reuse successful program logic—each reasoning starts from scratch.

## Analysis of eMoT's Three Core Modules

The eMoT framework includes three core modules:
- **Memory Corrosion Mechanism**: Strengthens frequently used effective reasoning paths, attenuates low-frequency patterns, and maintains dynamic balance—similar to the reinforcement and forgetting of biological memory;
- **Symbolic Anchoring Engine**: Calls a Python interpreter to perform deterministic calculations when encountering numerical operations, combining the flexibility of neural networks with the precision of symbolic systems;
- **Consistency-Driven Refinement**: Cross-validates each reasoning step with symbolic results, detects deviations, and iteratively corrects them to prevent error accumulation.

## Experimental Validation: Perfect Performance on Game of 24 and Improvements Across Multiple Benchmarks

Experimental validation shows eMoT's breakthrough results:
- **Game of 24 Task**: Achieved 100% accuracy, with a maximum improvement of 17.6% over the baseline;
- **Mathematical Reasoning Benchmarks**: Comprehensive improvements on datasets like GSM8K, ASDiv, SVAMP, and MGSM;
- **Lightweight Model Performance**: Excellent results using lightweight backbone models, proving that performance improvement comes from reasoning control rather than model scale.

## Comparison with Related Work: Innovations of eMoT

Compared with related work, eMoT's innovations are:
- **Chain of Thought (CoT)**: CoT is one-time reasoning, while eMoT enables persistent reuse of reasoning patterns;
- **External Memory Systems**: Traditional systems treat all memories equally, while eMoT dynamically evolves memories (reinforcement/attenuation);
- **Tool Usage**: eMoT seamlessly integrates symbolic computation with the reasoning process, rather than simple tool calls.

## Application Scenarios and Deployment Challenges

**Applicable Scenarios**:
1. Reasoning tasks requiring precise calculations (mathematics, physics, etc.);
2. Problems requiring systematic search (planning, scheduling);
3. Batch processing of repetitive reasoning patterns;
4. Resource-constrained environments (edge devices, small teams).

**Deployment Challenges**:
- Additional computational overhead for memory retrieval and symbolic execution;
- Memory requirements for storing historical memories;
- Security isolation issues for executing generated code.

## Limitations and Future Directions

**Current Limitations**:
1. Domain generalization ability needs verification (performance in out-of-training scenarios);
2. Hyperparameter sensitivity (e.g., memory corrosion rate requires task-specific tuning);
3. Interpretability of memory content needs improvement.

**Future Directions**:
1. Hierarchical memory (stratification of long-term/working memory);
2. Multi-agent collaboration with shared memory;
3. Continual learning (online memory updates without forgetting);
4. Cross-modal expansion (vision, audio, etc.).

## Conclusion: Model Scale Is Not the Only Key—Ingenious Design Matters More

eMoT represents a new direction for LLM reasoning enhancement. By combining dynamic memory with symbolic computation, lightweight models achieve performance that surpasses large models. The 100% accuracy on Game of 24 proves the value of structured reasoning control, indicating that model scale is not the only determinant of reasoning ability—ingenious architecture design and training strategies are equally important. This provides a 'small but powerful' methodology for resource-constrained scenarios and is expected to be applied in more fields in the future.
