Section 01
[Introduction] Video Modality Diagnostics: Diagnose the True Video Understanding Capability of Multimodal Video Models
Video Modality Diagnostics (VMD) is a tool for diagnosing multimodal VideoQA models (visual/audio/subtitle), supporting modality ablation, contribution analysis, and robustness testing. It can be used for offline testing or with the Colab VLM backend. Its core purpose is to help researchers determine whether models truly utilize video information rather than relying on audio or subtitles to "cheat".
Original author/maintainer: mlahozy21, Source platform: GitHub, Project link: https://github.com/mlahozy21/video-modality-diagnostics, Update time: 2026-06-11T14:42:28Z.