Section 01
Application of Vision-Language Models in Gait Screening: Guide to Zero-Shot and Multimodal Context Learning
The Vera Research team open-sourced the research code and dataset of vision-language models for gait classification screening, exploring the application of zero-shot learning and multimodal context learning in the detection of Parkinson's disease and knee osteoarthritis. Core conclusion: Zero-shot vision-language models perform poorly, but similarity-guided multimodal in-context learning (ICL) can significantly narrow the performance gap with dedicated video encoders. This study provides important insights for the application of general AI models in specialized medical fields.