# A Study on the Adaptability of Large Language Models in Non-Stationary Environments: Rigid Behaviors Revealed by Reversal Learning Experiments

> Through probabilistic reversal learning tasks, the study found that mainstream large language models exhibit significant adaptive rigidity when the environment changes, with a significantly lower sensitivity to negative feedback than humans, providing a new perspective for evaluating the dynamic decision-making capabilities of LLMs.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-05T16:53:23.000Z
- 最近活动: 2026-04-07T07:29:57.450Z
- 热度: 108.4
- 关键词: 逆转学习, 大型语言模型, 非平稳环境, 适应性, 强化学习, 决策行为
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2604-04182v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2604-04182v1
- Markdown 来源: floors_fallback

---

## [Introduction] Study on Adaptive Rigidity of Large Language Models in Non-Stationary Environments

Through probabilistic reversal learning tasks, the study found that mainstream large language models (LLMs) exhibit significant adaptive rigidity when the environment changes, with a significantly lower sensitivity to negative feedback than humans, providing a new perspective for evaluating the dynamic decision-making capabilities of LLMs. This study reveals the decision-making limitations of LLMs in non-stationary environments and has important reference value for improving the adaptability of AI systems.

## Research Background: Non-Stationary Environments and Reversal Learning Paradigm

## Research Background: Decision-Making Challenges in Non-Stationary Environments

Decision-making environments in the real world are often dynamically changing. Today's optimal choice may become suboptimal or even wrong tomorrow due to changes in environmental conditions. This non-stationarity poses a severe test to the adaptability of intelligent systems. Humans can flexibly adjust their strategies when facing environmental changes; but how do artificial intelligence systems, especially large language models (LLMs), perform in such dynamic environments?

Reversal Learning is a classic paradigm in cognitive science for studying adaptive decision-making. In this task, participants need to learn to choose the option with a higher reward probability among multiple options, and when the reward rules suddenly reverse, they must quickly adjust their strategies. This paradigm is particularly suitable for evaluating the flexibility and learning ability of agents when the environment changes.

## Experimental Design: Two-Option Reversal Learning Task and Multi-Model Comparison

## Experimental Design: Multi-Model Comparison and Human Benchmark

This study designed a two-option probabilistic reversal learning task, which includes three potential states and two switching trigger mechanisms: performance-based switching and timeout-based switching. The researchers compared two conditions: deterministic fixed transition cycles and random transition schedules, with the latter increasing environmental volatility.

The tested models include three current mainstream large language models:
- DeepSeek-V3.2
- Gemini-3
- GPT-5.2

Meanwhile, human data was used as a behavioral reference benchmark to evaluate the differences between the decision-making behaviors of LLMs and human cognitive patterns.

## Key Findings: Adaptive Rigidity and Behavioral Asymmetry of LLMs

## Key Findings: Asymmetric Evidence Use and Adaptive Rigidity

### Asymmetry of Win-Stay and Lose-Shift

The experimental results show a striking pattern: among all tested models, the "win-stay" behavior (continuing to choose the same option after receiving a reward) is close to the ceiling level, while the "lose-shift" behavior (switching to another option after not receiving a reward) is significantly weakened.

This asymmetry reveals that LLMs have a systematic bias in using positive and negative evidence. The models can make good use of successful experiences, but their response to failure experiences is relatively slow. This contrasts with human behavior—humans are usually more sensitive to losses, and this loss aversion has adaptive significance in evolution.

### Inter-Model Differences: From Extreme Stubbornness to Relative Flexibility

Among the three models, DeepSeek-V3.2 showed the most extreme behavioral pattern: it exhibited severe perseveration after a reversal occurred, i.e., continuing to choose the previously rewarded option, while its overall learning acquisition ability was also weak. In contrast, Gemini-3 and GPT-5.2 adapted faster, although their sensitivity to losses was still lower than that of humans.

This finding suggests that different architectures and training methods may lead to essential differences in the behavioral characteristics of models in dynamic environments.

### Coexistence of High Returns and Rigid Adaptation

An interesting finding is that random transitions increased the stubborn behavior of LLMs after reversals, but did not consistently reduce the total number of wins. This indicates that high aggregate returns and rigid adaptation can coexist—the models may maintain overall performance through other strategies (such as exploiting short-term fluctuations) rather than truly learning to flexibly adapt to environmental changes.

## Mechanism Analysis: Three Mechanisms Leading to Adaptive Rigidity

## Mechanism Analysis: Hierarchical Reinforcement Learning Modeling

To deeply understand the mechanisms behind these behaviors, the researchers used a Hierarchical Reinforcement Learning (Hierarchical RL) model to fit and analyze the data. The analysis revealed three separable mechanisms leading to adaptive rigidity:

### Weak Loss Learning

The models have a low learning rate for negative feedback, making it impossible for them to quickly learn from mistakes. This mechanism directly explains the attenuation of "lose-shift" behavior.

### Strategy Determinism Inflation

The strategy distribution of the models is too concentrated, lacking sufficient exploration. Even in the face of negative feedback, the models are difficult to change their behavior patterns due to high determinism.

### Value Polarization Caused by Counterfactual Suppression

The models have a bias in the value estimation of unselected options, leading to polarization of value judgments by suppressing counterfactual thinking (i.e., "what if I had chosen another option at that time").

These three mechanisms can act independently or together to cause the observed rigid adaptive behavior.

## Research Significance and Future Directions: Implications from Evaluation to AI Safety

## Research Significance and Future Directions

### Implications for LLM Evaluation

This study emphasizes that when evaluating large language models, special attention needs to be paid to their performance in non-stationary environments. Traditional static benchmark tests may not capture the adaptive weaknesses of models in dynamic changes. The researchers suggest developing reversal-sensitive diagnostic tools and volatility-aware evaluation models to more comprehensively test the decision-making capabilities of LLMs.

### Implications for AI Safety

If AI systems show excessive stubbornness when the environment changes, this may bring risks in practical applications. For example, in scenarios such as autonomous driving, medical diagnosis, or financial transactions, the system needs to quickly identify environmental changes and adjust strategies. Understanding and improving the adaptive rigidity of LLMs is of great significance for building more reliable AI systems.

### Future Research Directions

This study opens up multiple directions for subsequent work: exploring training methods to improve the loss sensitivity of models, designing specialized adaptability enhancement technologies, and extending the reversal learning paradigm to more complex multi-step decision-making tasks.

## Conclusion: Key Findings and Reference Value of LLM Adaptive Rigidity

## Conclusion

Through systematic reversal learning experiments, this study reveals the adaptive rigidity exhibited by mainstream large language models in non-stationary environments. Although these models perform well in static tasks, they have obvious limitations in using negative feedback and quickly adjusting strategies. This finding not only enhances our understanding of the decision-making mechanisms of LLMs but also provides an important reference for the future development of more adaptive AI systems.