Section 01
V-tableR1: Process-Supervised Reinforcement Learning Enables Verifiable Multimodal Table Reasoning (Introduction)
V-tableR1 leverages a process-supervised reinforcement learning framework to shift multimodal large models from black-box pattern matching to verifiable logical reasoning. This framework introduces a dedicated Critic VLM to provide step-by-step feedback, combined with the PGPO optimization algorithm. With only 4B parameters, it outperforms models 18 times its size and achieves the state-of-the-art among open-source models on complex table reasoning benchmarks.