Zing Forum

Reading

Reason to Play: A Study on Behavioral and Neural Alignment Between Cutting-Edge Reasoning Models and Human Game Learners

This paper evaluates the similarity between cutting-edge Large Reasoning Models (LRMs) and human learning patterns using complex human game behavior and fMRI data. The study finds that LRMs significantly outperform deep reinforcement learning agents in behavioral pattern matching and brain activity prediction, providing a new computational model for understanding human learning and decision-making.

大型推理模型神经对齐人类学习fMRI强化学习认知科学游戏学习
Published 2026-05-09 01:07Recent activity 2026-05-11 11:24Estimated read 7 min
Reason to Play: A Study on Behavioral and Neural Alignment Between Cutting-Edge Reasoning Models and Human Game Learners
1

Section 01

【Introduction】A Study on Behavioral and Neural Alignment Between Cutting-Edge Reasoning Models and Human Game Learners

This paper evaluates the similarity between cutting-edge Large Reasoning Models (LRMs) and human learning patterns using complex human game behavior and fMRI data. The study finds that LRMs significantly outperform deep reinforcement learning agents in behavioral pattern matching and brain activity prediction, providing a new computational model for understanding human learning and decision-making.

2

Section 02

Unique Capabilities of Human Learning and the Challenge of AI Replication

Unique Capabilities of Human Learning

Human learning ability in new environments is remarkable, with core features including:

  • Rapid rule discovery: Inferring underlying rules and patterns from limited observations
  • Hypothesis revision: Updating internal models based on new evidence
  • Multi-step planning: Prospective action planning based on knowledge

For a long time, AI researchers have tried to replicate this ability, but whether modern AI systems can learn and plan like humans remains an open question.

3

Section 03

Research Design: Triple Evaluation of Game, Behavior, and Brain Activity

Research Design: Triple Evaluation of Game, Behavior, and Brain Activity

Experimental Task: Participants learn a novel video game with hidden rules, requiring hypothesis revision and multi-step planning, capturing the challenges of exploration and decision-making in uncertain environments.

Triple Evaluation Framework:

  1. Game Ability: Can the model learn to play the game and achieve good results?
  2. Behavioral Matching: Is the model's learning process similar to human behavioral patterns?
  3. Neural Alignment: Can the model's internal representations predict human brain activity?
4

Section 04

Types of AI Models Evaluated

Evaluated Models: From Reinforcement Learning to Reasoning Models

Cutting-edge Large Reasoning Models (LRMs): Possess strong language understanding, generation, and complex reasoning/planning capabilities, and are the focus of the study.

Deep Reinforcement Learning Agents: Include model-free and model-based types, optimizing behavior through trial and error.

Bayesian Theory Agents: Based on probabilistic reasoning, explicitly maintaining a probability distribution of rules and performing Bayesian updates.

5

Section 05

Core Findings: LRMs Demonstrate Excellent Human Similarity

Core Findings: LRMs Demonstrate Excellent Human Similarity

  • Behavioral Pattern Matching: LRMs' learning trajectories are closest to humans, including exploration methods, strategy adjustments, and rule understanding processes.
  • Brain Activity Prediction Advantage: The correlation between LRMs' internal representations and human neural activity is significantly higher than that of reinforcement learning agents, covering cortical and subcortical regions.
  • Robustness: Permutation control experiments verify the reliability of the results.
6

Section 06

Mechanism Exploration: Neural Alignment Stems from Contextual Representations

Mechanism Exploration: Representation vs. Reasoning

The study found that brain activity alignment mainly reflects the model's contextual representation of game states, rather than downstream planning or reasoning processes. This suggests that LRMs encode world information in a way similar to the human brain, which is key to human-like intelligence.

7

Section 07

Theoretical Significance and Research Limitations

Theoretical Significance and Research Limitations

Theoretical Significance: LRMs provide a new computational model for human cognition, which can generate testable hypotheses to promote the development of cognitive science.

Limitations:

  • Task scope is limited to simple video games
  • The mechanism of neural alignment is not yet clear
  • Individual differences are not fully considered

Future Directions: Expand to complex real-world tasks, explore alignment mechanisms, and study individual differences.

8

Section 08

Summary and Future Outlook

Summary and Future Outlook

This study, through a triple evaluation framework, is the first to systematically prove the alignment of LRMs with human learners at both behavioral and neural levels. This opens up new directions for AI and cognitive science: LRMs may capture the core features of human cognition and become a bridge connecting artificial intelligence and human intelligence. In the future, LRMs are expected to show stronger capabilities in simulating human cognition.