Reading

FAST: Fast-Slow Thinking with GRPO Boosts Large Vision-Language Model Reasoning

FAST is an innovative fast-slow thinking training method that enhances the reasoning capabilities of large vision-language models (VLMs) via the GRPO reinforcement learning framework, and it has received Spotlight recognition at NeurIPS 2025.

视觉语言模型VLMGRPO快慢思维强化学习视觉推理NeurIPS 2025

Published 2026-04-16 11:50Recent activity 2026-04-16 11:56Estimated read 6 min

FAST: Fast-Slow Thinking with GRPO Boosts Large Vision-Language Model Reasoning

Section 01

FAST: Fast-Slow Thinking with GRPO Boosts VLM Reasoning (NeurIPS 2025 Spotlight)

FAST is an innovative fast-slow thinking training method that enhances the reasoning capabilities of large vision-language models (VLMs) using the GRPO reinforcement learning framework. This project has received Spotlight recognition at NeurIPS 2025. Its core lies in introducing the dual-process theory from cognitive science, enabling the model to dynamically select thinking modes and optimize reasoning decisions in combination with the GRPO framework, aiming to address the insufficient deep reasoning capabilities of VLMs.

Section 02

Challenges in Vision-Language Model Reasoning

VLMs face unique challenges in reasoning tasks, such as multi-modal information integration, precise understanding of visual details, interpretability of reasoning chains, and computational efficiency. Traditional supervised learning relies on replicating reasoning patterns from training data, making it difficult to cultivate true reasoning abilities, especially with poor performance in out-of-distribution scenarios.

Section 03

Fast-Slow Thinking Mechanism: Inspiration from Cognitive Science

FAST is based on the dual-process theory in cognitive science: Fast thinking (System1) is quick, intuitive, and automated, handling routine tasks; Slow thinking (System2) is slow, analytical, careful, and accurate, dealing with complex problems. The model learns to dynamically switch thinking modes based on task complexity—using fast thinking for simple problems and slow thinking for complex ones.

Section 04

GRPO Framework and FAST Training Architecture

FAST adopts the GRPO (Group Relative Policy Optimization) reinforcement learning framework, whose core features include intra-group comparison (relative evaluation of generated candidate answers), relative rewards (based on intra-group ranking), and policy stability (clipping targets to prevent excessive updates). The training architecture includes a dual-path reasoning network (fast and slow paths), an adaptive switching mechanism (based on factors like visual complexity), and multi-modal reasoning chains; it uses a curriculum learning strategy, gradually transitioning from basic simple tasks to advanced complex tasks.

Section 05

Experimental Results and Method Comparison

FAST significantly outperforms baseline models in reasoning accuracy, computational efficiency, generalization ability, and interpretability. Compared with chain-of-thought methods, adaptive reasoning avoids resource waste; compared with pure RL methods, training is more stable; compared with model scaling methods, it improves performance through intelligent computation allocation, making it more practical.

Section 06

Application Scenarios of FAST

FAST is suitable for scenarios such as intelligent document analysis (processing complex text-image documents), educational assistance (displaying problem-solving reasoning chains), scientific research (analyzing scientific images), and visual question-answering systems (efficiently handling various queries), balancing accuracy and efficiency.

Section 07

Limitations and Future Directions

FAST has limitations such as the switching mechanism relying on heuristic rules, room for improvement in multi-modal fusion, and not being extended to other modalities. Future directions include exploring meta-learning to dynamically adjust the switching mechanism, optimizing multi-modal fusion, extending to modalities like audio and video, and balancing training and reasoning computation budgets.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15