# Fast-Slow Training (FST) Framework: Enabling Continuous Adaptive Evolution of Large Language Models

> Researchers from institutions including the University of California, Berkeley, proposed the Fast-Slow Training (FST) framework, which treats model parameters as "slow weights" and optimized context as "fast weights". It achieves task-specific learning while maintaining the model's general capabilities. Experiments show that FST improves sample efficiency by 3x, reduces KL divergence by 70%, and outperforms traditional RL methods significantly in continuous learning scenarios.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-12T17:58:20.000Z
- 最近活动: 2026-05-13T18:48:51.923Z
- 热度: 126.2
- 关键词: 大语言模型, 持续学习, 灾难性遗忘, 强化学习, 上下文学习, 模型可塑性, Fast-Slow Training, 参数高效微调
- 页面链接: https://www.zingnex.cn/en/forum/thread/fst
- Canonical: https://www.zingnex.cn/forum/thread/fst
- Markdown 来源: floors_fallback

---

## [Introduction] Fast-Slow Training (FST) Framework: Solving the Core Dilemma of LLM Continuous Learning

Institutions including the University of California, Berkeley, proposed the Fast-Slow Training (FST) framework, which treats model parameters as "slow weights" (to maintain general reasoning abilities) and optimized context as "fast weights" (to absorb task-specific information). It achieves task specialization while preserving general capabilities. Experiments show a 3x improvement in sample efficiency, a 70% reduction in KL divergence, significantly better performance than traditional RL in continuous learning, and effective mitigation of catastrophic forgetting and loss of plasticity.

## Research Background: The Binary Dilemma of LLM Training

Traditional LLM training relies on parameter updates (e.g., RL), which easily leads to catastrophic forgetting and loss of plasticity; while in-context learning has low cost and fast adaptation, its performance ceiling is insufficient. Core question: Does learning have to be limited to the binary choice between "in-context" or "in-weight"?

## FST Framework Design: Fast-Slow Weight Collaboration Mechanism

### Slow Weights
Corresponds to the actual model parameters, kept close to the pre-trained state to preserve general capabilities and avoid excessive drift.

### Fast Weights
Virtual weights implemented via optimized context, learning task information from text feedback without modifying model parameters.

### Collaboration Mechanism
Fast weights adapt quickly to tasks, slow weights maintain general capabilities; the division of labor balances efficiency and generalization.

## Experimental Results: Dual Improvement in Efficiency and Stability

- Sample efficiency is 3x higher than traditional RL, reaching target performance faster with the same data;
- KL divergence reduced by 70%, the model is closer to the original distribution and retains general knowledge;
- Significantly mitigates catastrophic forgetting, with little interference to old knowledge when learning new tasks;
- Maintains plasticity, making it easier to adapt to subsequent tasks after completing a task.

## Continuous Learning Scenarios: Unique Advantages of FST

In dynamic continuous learning, traditional RL tends to have performance stagnation, while FST can continuously acquire new task knowledge, making it suitable for long-term deployment and practical applications that need to adapt to new environments continuously.

## Technical Significance and Application Prospects

FST breaks the binary opposition between "parameter update vs. in-context learning" and provides a new training paradigm. Application value:
- Efficient fine-tuning: adapting to specific domains with little data;
- Stable deployment: maintaining general capabilities in continuous services;
- Multi-task adaptation: no interference when switching between tasks.

## Research Insights and Future Directions

Inspired by the dual-process theory of human cognition (System 1/2 thinking), it rethinks the essence of learning. Future directions can include exploring multi-level learning mechanisms, extending to multimodal models and embodied intelligence; improving efficiency and adaptability without increasing parameters is an important direction in the LLM field.
