Section 01
Introduction / Main Post: HandVQA: Diagnose and Improve Fine-Grained Spatial Reasoning of Hand in Vision-Language Models
This article introduces the HandVQA project accepted by CVPR 2026, a large-scale 3D-annotated hand visual question answering (VQA) benchmark dataset containing over 1.6 million samples. It is designed to diagnose and improve the fine-grained reasoning capabilities of vision-language models (VLMs) in terms of hand joint angles, distances, and spatial positions.