Zing Forum

Reading

FAST: Fast-Slow Thinking with GRPO Boosts Large Vision-Language Model Reasoning

FAST is an innovative fast-slow thinking training method that enhances the reasoning capabilities of large vision-language models (VLMs) via the GRPO reinforcement learning framework, and it has received Spotlight recognition at NeurIPS 2025.

视觉语言模型VLMGRPO快慢思维强化学习视觉推理NeurIPS 2025
Published 2026-04-16 11:50Recent activity 2026-04-16 11:56Estimated read 6 min
FAST: Fast-Slow Thinking with GRPO Boosts Large Vision-Language Model Reasoning
1

Section 01

FAST: Fast-Slow Thinking with GRPO Boosts VLM Reasoning (NeurIPS 2025 Spotlight)

FAST is an innovative fast-slow thinking training method that enhances the reasoning capabilities of large vision-language models (VLMs) using the GRPO reinforcement learning framework. This project has received Spotlight recognition at NeurIPS 2025. Its core lies in introducing the dual-process theory from cognitive science, enabling the model to dynamically select thinking modes and optimize reasoning decisions in combination with the GRPO framework, aiming to address the insufficient deep reasoning capabilities of VLMs.

2

Section 02

Challenges in Vision-Language Model Reasoning

VLMs face unique challenges in reasoning tasks, such as multi-modal information integration, precise understanding of visual details, interpretability of reasoning chains, and computational efficiency. Traditional supervised learning relies on replicating reasoning patterns from training data, making it difficult to cultivate true reasoning abilities, especially with poor performance in out-of-distribution scenarios.

3

Section 03

Fast-Slow Thinking Mechanism: Inspiration from Cognitive Science

FAST is based on the dual-process theory in cognitive science: Fast thinking (System1) is quick, intuitive, and automated, handling routine tasks; Slow thinking (System2) is slow, analytical, careful, and accurate, dealing with complex problems. The model learns to dynamically switch thinking modes based on task complexity—using fast thinking for simple problems and slow thinking for complex ones.

4

Section 04

GRPO Framework and FAST Training Architecture

FAST adopts the GRPO (Group Relative Policy Optimization) reinforcement learning framework, whose core features include intra-group comparison (relative evaluation of generated candidate answers), relative rewards (based on intra-group ranking), and policy stability (clipping targets to prevent excessive updates). The training architecture includes a dual-path reasoning network (fast and slow paths), an adaptive switching mechanism (based on factors like visual complexity), and multi-modal reasoning chains; it uses a curriculum learning strategy, gradually transitioning from basic simple tasks to advanced complex tasks.

5

Section 05

Experimental Results and Method Comparison

FAST significantly outperforms baseline models in reasoning accuracy, computational efficiency, generalization ability, and interpretability. Compared with chain-of-thought methods, adaptive reasoning avoids resource waste; compared with pure RL methods, training is more stable; compared with model scaling methods, it improves performance through intelligent computation allocation, making it more practical.

6

Section 06

Application Scenarios of FAST

FAST is suitable for scenarios such as intelligent document analysis (processing complex text-image documents), educational assistance (displaying problem-solving reasoning chains), scientific research (analyzing scientific images), and visual question-answering systems (efficiently handling various queries), balancing accuracy and efficiency.

7

Section 07

Limitations and Future Directions

FAST has limitations such as the switching mechanism relying on heuristic rules, room for improvement in multi-modal fusion, and not being extended to other modalities. Future directions include exploring meta-learning to dynamically adjust the switching mechanism, optimizing multi-modal fusion, extending to modalities like audio and video, and balancing training and reasoning computation budgets.