Zing Forum

Reading

eMoT: Dynamic Memory-of-Thought Framework Achieves 100% Accuracy on Game of 24, Enabling Strong Reasoning in Lightweight Models

eMoT uses three core modules—memory corrosion, symbolic anchoring, and consistency refinement—to treat reasoning trajectories as dynamically evolving memories rather than static templates, enabling lightweight models to achieve reasoning performance that surpasses large-scale models.

eMoT思维记忆神经符号AI推理增强Game of 24多步推理记忆腐蚀符号锚定
Published 2026-06-01 18:41Recent activity 2026-06-02 11:23Estimated read 7 min
eMoT: Dynamic Memory-of-Thought Framework Achieves 100% Accuracy on Game of 24, Enabling Strong Reasoning in Lightweight Models
1

Section 01

Introduction: eMoT Framework Enables Strong Reasoning in Lightweight Models, Achieves 100% Accuracy on Game of 24

eMoT (evolving Memory-of-Thought) is a dynamic memory-of-thought framework. Through three core modules—memory corrosion, symbolic anchoring, and consistency refinement—it treats reasoning trajectories as dynamically evolving memories instead of static templates. This framework enables lightweight models to achieve reasoning performance that surpasses large-scale models, especially reaching 100% accuracy on the classic mathematical reasoning task Game of 24.

2

Section 02

Problem Background: Two Core Defects in Large Model Reasoning

Large Language Models (LLMs) have two core defects in multi-step reasoning:

  1. Hallucination Problem: Intermediate steps easily produce incorrect conclusions and continue to derive from them, and self-correction is difficult;
  2. Weak Numerical Calculation Ability: Exact arithmetic operations often go wrong, contrasting with humans' habit of using tools. The root cause is that LLMs treat reasoning as a one-time generation process, unable to retain or reuse successful program logic—each reasoning starts from scratch.
3

Section 03

Analysis of eMoT's Three Core Modules

The eMoT framework includes three core modules:

  • Memory Corrosion Mechanism: Strengthens frequently used effective reasoning paths, attenuates low-frequency patterns, and maintains dynamic balance—similar to the reinforcement and forgetting of biological memory;
  • Symbolic Anchoring Engine: Calls a Python interpreter to perform deterministic calculations when encountering numerical operations, combining the flexibility of neural networks with the precision of symbolic systems;
  • Consistency-Driven Refinement: Cross-validates each reasoning step with symbolic results, detects deviations, and iteratively corrects them to prevent error accumulation.
4

Section 04

Experimental Validation: Perfect Performance on Game of 24 and Improvements Across Multiple Benchmarks

Experimental validation shows eMoT's breakthrough results:

  • Game of 24 Task: Achieved 100% accuracy, with a maximum improvement of 17.6% over the baseline;
  • Mathematical Reasoning Benchmarks: Comprehensive improvements on datasets like GSM8K, ASDiv, SVAMP, and MGSM;
  • Lightweight Model Performance: Excellent results using lightweight backbone models, proving that performance improvement comes from reasoning control rather than model scale.
5

Section 05

Comparison with Related Work: Innovations of eMoT

Compared with related work, eMoT's innovations are:

  • Chain of Thought (CoT): CoT is one-time reasoning, while eMoT enables persistent reuse of reasoning patterns;
  • External Memory Systems: Traditional systems treat all memories equally, while eMoT dynamically evolves memories (reinforcement/attenuation);
  • Tool Usage: eMoT seamlessly integrates symbolic computation with the reasoning process, rather than simple tool calls.
6

Section 06

Application Scenarios and Deployment Challenges

Applicable Scenarios:

  1. Reasoning tasks requiring precise calculations (mathematics, physics, etc.);
  2. Problems requiring systematic search (planning, scheduling);
  3. Batch processing of repetitive reasoning patterns;
  4. Resource-constrained environments (edge devices, small teams).

Deployment Challenges:

  • Additional computational overhead for memory retrieval and symbolic execution;
  • Memory requirements for storing historical memories;
  • Security isolation issues for executing generated code.
7

Section 07

Limitations and Future Directions

Current Limitations:

  1. Domain generalization ability needs verification (performance in out-of-training scenarios);
  2. Hyperparameter sensitivity (e.g., memory corrosion rate requires task-specific tuning);
  3. Interpretability of memory content needs improvement.

Future Directions:

  1. Hierarchical memory (stratification of long-term/working memory);
  2. Multi-agent collaboration with shared memory;
  3. Continual learning (online memory updates without forgetting);
  4. Cross-modal expansion (vision, audio, etc.).
8

Section 08

Conclusion: Model Scale Is Not the Only Key—Ingenious Design Matters More

eMoT represents a new direction for LLM reasoning enhancement. By combining dynamic memory with symbolic computation, lightweight models achieve performance that surpasses large models. The 100% accuracy on Game of 24 proves the value of structured reasoning control, indicating that model scale is not the only determinant of reasoning ability—ingenious architecture design and training strategies are equally important. This provides a 'small but powerful' methodology for resource-constrained scenarios and is expected to be applied in more fields in the future.