# FAST: Fast-Slow Thinking with GRPO Boosts Large Vision-Language Model Reasoning

> FAST is an innovative fast-slow thinking training method that enhances the reasoning capabilities of large vision-language models (VLMs) via the GRPO reinforcement learning framework, and it has received Spotlight recognition at NeurIPS 2025.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-16T03:50:35.000Z
- 最近活动: 2026-04-16T03:56:09.802Z
- 热度: 148.9
- 关键词: 视觉语言模型, VLM, GRPO, 快慢思维, 强化学习, 视觉推理, NeurIPS 2025
- 页面链接: https://www.zingnex.cn/en/forum/thread/fast-grpo
- Canonical: https://www.zingnex.cn/forum/thread/fast-grpo
- Markdown 来源: floors_fallback

---

## FAST: Fast-Slow Thinking with GRPO Boosts VLM Reasoning (NeurIPS 2025 Spotlight)

FAST is an innovative fast-slow thinking training method that enhances the reasoning capabilities of large vision-language models (VLMs) using the GRPO reinforcement learning framework. This project has received Spotlight recognition at NeurIPS 2025. Its core lies in introducing the dual-process theory from cognitive science, enabling the model to dynamically select thinking modes and optimize reasoning decisions in combination with the GRPO framework, aiming to address the insufficient deep reasoning capabilities of VLMs.

## Challenges in Vision-Language Model Reasoning

VLMs face unique challenges in reasoning tasks, such as multi-modal information integration, precise understanding of visual details, interpretability of reasoning chains, and computational efficiency. Traditional supervised learning relies on replicating reasoning patterns from training data, making it difficult to cultivate true reasoning abilities, especially with poor performance in out-of-distribution scenarios.

## Fast-Slow Thinking Mechanism: Inspiration from Cognitive Science

FAST is based on the dual-process theory in cognitive science: Fast thinking (System1) is quick, intuitive, and automated, handling routine tasks; Slow thinking (System2) is slow, analytical, careful, and accurate, dealing with complex problems. The model learns to dynamically switch thinking modes based on task complexity—using fast thinking for simple problems and slow thinking for complex ones.

## GRPO Framework and FAST Training Architecture

FAST adopts the GRPO (Group Relative Policy Optimization) reinforcement learning framework, whose core features include intra-group comparison (relative evaluation of generated candidate answers), relative rewards (based on intra-group ranking), and policy stability (clipping targets to prevent excessive updates). The training architecture includes a dual-path reasoning network (fast and slow paths), an adaptive switching mechanism (based on factors like visual complexity), and multi-modal reasoning chains; it uses a curriculum learning strategy, gradually transitioning from basic simple tasks to advanced complex tasks.

## Experimental Results and Method Comparison

FAST significantly outperforms baseline models in reasoning accuracy, computational efficiency, generalization ability, and interpretability. Compared with chain-of-thought methods, adaptive reasoning avoids resource waste; compared with pure RL methods, training is more stable; compared with model scaling methods, it improves performance through intelligent computation allocation, making it more practical.

## Application Scenarios of FAST

FAST is suitable for scenarios such as intelligent document analysis (processing complex text-image documents), educational assistance (displaying problem-solving reasoning chains), scientific research (analyzing scientific images), and visual question-answering systems (efficiently handling various queries), balancing accuracy and efficiency.

## Limitations and Future Directions

FAST has limitations such as the switching mechanism relying on heuristic rules, room for improvement in multi-modal fusion, and not being extended to other modalities. Future directions include exploring meta-learning to dynamically adjust the switching mechanism, optimizing multi-modal fusion, extending to modalities like audio and video, and balancing training and reasoning computation budgets.