Reading

A Study on the Adaptability of Large Language Models in Non-Stationary Environments: Rigid Behaviors Revealed by Reversal Learning Experiments

Through probabilistic reversal learning tasks, the study found that mainstream large language models exhibit significant adaptive rigidity when the environment changes, with a significantly lower sensitivity to negative feedback than humans, providing a new perspective for evaluating the dynamic decision-making capabilities of LLMs.

逆转学习大型语言模型非平稳环境适应性强化学习决策行为

Published 2026-04-06 00:53Recent activity 2026-04-07 15:29Estimated read 12 min

A Study on the Adaptability of Large Language Models in Non-Stationary Environments: Rigid Behaviors Revealed by Reversal Learning Experiments

Section 01

[Introduction] Study on Adaptive Rigidity of Large Language Models in Non-Stationary Environments

Through probabilistic reversal learning tasks, the study found that mainstream large language models (LLMs) exhibit significant adaptive rigidity when the environment changes, with a significantly lower sensitivity to negative feedback than humans, providing a new perspective for evaluating the dynamic decision-making capabilities of LLMs. This study reveals the decision-making limitations of LLMs in non-stationary environments and has important reference value for improving the adaptability of AI systems.

Section 02

Research Background: Non-Stationary Environments and Reversal Learning Paradigm

Research Background: Decision-Making Challenges in Non-Stationary Environments

Decision-making environments in the real world are often dynamically changing. Today's optimal choice may become suboptimal or even wrong tomorrow due to changes in environmental conditions. This non-stationarity poses a severe test to the adaptability of intelligent systems. Humans can flexibly adjust their strategies when facing environmental changes; but how do artificial intelligence systems, especially large language models (LLMs), perform in such dynamic environments?

Reversal Learning is a classic paradigm in cognitive science for studying adaptive decision-making. In this task, participants need to learn to choose the option with a higher reward probability among multiple options, and when the reward rules suddenly reverse, they must quickly adjust their strategies. This paradigm is particularly suitable for evaluating the flexibility and learning ability of agents when the environment changes.

Section 03

Experimental Design: Two-Option Reversal Learning Task and Multi-Model Comparison

Experimental Design: Multi-Model Comparison and Human Benchmark

This study designed a two-option probabilistic reversal learning task, which includes three potential states and two switching trigger mechanisms: performance-based switching and timeout-based switching. The researchers compared two conditions: deterministic fixed transition cycles and random transition schedules, with the latter increasing environmental volatility.

The tested models include three current mainstream large language models:

DeepSeek-V3.2
Gemini-3
GPT-5.2

Meanwhile, human data was used as a behavioral reference benchmark to evaluate the differences between the decision-making behaviors of LLMs and human cognitive patterns.

Section 04

Key Findings: Adaptive Rigidity and Behavioral Asymmetry of LLMs

Key Findings: Asymmetric Evidence Use and Adaptive Rigidity

Asymmetry of Win-Stay and Lose-Shift

The experimental results show a striking pattern: among all tested models, the "win-stay" behavior (continuing to choose the same option after receiving a reward) is close to the ceiling level, while the "lose-shift" behavior (switching to another option after not receiving a reward) is significantly weakened.

This asymmetry reveals that LLMs have a systematic bias in using positive and negative evidence. The models can make good use of successful experiences, but their response to failure experiences is relatively slow. This contrasts with human behavior—humans are usually more sensitive to losses, and this loss aversion has adaptive significance in evolution.

Inter-Model Differences: From Extreme Stubbornness to Relative Flexibility

Among the three models, DeepSeek-V3.2 showed the most extreme behavioral pattern: it exhibited severe perseveration after a reversal occurred, i.e., continuing to choose the previously rewarded option, while its overall learning acquisition ability was also weak. In contrast, Gemini-3 and GPT-5.2 adapted faster, although their sensitivity to losses was still lower than that of humans.

This finding suggests that different architectures and training methods may lead to essential differences in the behavioral characteristics of models in dynamic environments.

Coexistence of High Returns and Rigid Adaptation

An interesting finding is that random transitions increased the stubborn behavior of LLMs after reversals, but did not consistently reduce the total number of wins. This indicates that high aggregate returns and rigid adaptation can coexist—the models may maintain overall performance through other strategies (such as exploiting short-term fluctuations) rather than truly learning to flexibly adapt to environmental changes.

Section 05

Mechanism Analysis: Three Mechanisms Leading to Adaptive Rigidity

Mechanism Analysis: Hierarchical Reinforcement Learning Modeling

To deeply understand the mechanisms behind these behaviors, the researchers used a Hierarchical Reinforcement Learning (Hierarchical RL) model to fit and analyze the data. The analysis revealed three separable mechanisms leading to adaptive rigidity:

Weak Loss Learning

The models have a low learning rate for negative feedback, making it impossible for them to quickly learn from mistakes. This mechanism directly explains the attenuation of "lose-shift" behavior.

Strategy Determinism Inflation

The strategy distribution of the models is too concentrated, lacking sufficient exploration. Even in the face of negative feedback, the models are difficult to change their behavior patterns due to high determinism.

Value Polarization Caused by Counterfactual Suppression

The models have a bias in the value estimation of unselected options, leading to polarization of value judgments by suppressing counterfactual thinking (i.e., "what if I had chosen another option at that time").

These three mechanisms can act independently or together to cause the observed rigid adaptive behavior.

Section 06

Research Significance and Future Directions: Implications from Evaluation to AI Safety

Research Significance and Future Directions

Implications for LLM Evaluation

This study emphasizes that when evaluating large language models, special attention needs to be paid to their performance in non-stationary environments. Traditional static benchmark tests may not capture the adaptive weaknesses of models in dynamic changes. The researchers suggest developing reversal-sensitive diagnostic tools and volatility-aware evaluation models to more comprehensively test the decision-making capabilities of LLMs.

Implications for AI Safety

If AI systems show excessive stubbornness when the environment changes, this may bring risks in practical applications. For example, in scenarios such as autonomous driving, medical diagnosis, or financial transactions, the system needs to quickly identify environmental changes and adjust strategies. Understanding and improving the adaptive rigidity of LLMs is of great significance for building more reliable AI systems.

Future Research Directions

This study opens up multiple directions for subsequent work: exploring training methods to improve the loss sensitivity of models, designing specialized adaptability enhancement technologies, and extending the reversal learning paradigm to more complex multi-step decision-making tasks.

Section 07

Conclusion: Key Findings and Reference Value of LLM Adaptive Rigidity

Conclusion

Through systematic reversal learning experiments, this study reveals the adaptive rigidity exhibited by mainstream large language models in non-stationary environments. Although these models perform well in static tasks, they have obvious limitations in using negative feedback and quickly adjusting strategies. This finding not only enhances our understanding of the decision-making mechanisms of LLMs but also provides an important reference for the future development of more adaptive AI systems.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15