Section 01
[Introduction] OmniVerifier-M1: Core Breakthroughs of a Multimodal Universal Verifier
This paper proposes the OmniVerifier-M1 multimodal verifier, which uses symbolic outputs (e.g., bounding boxes) as the basis for meta-verification and adopts decoupled reinforcement learning objectives. It achieves robust verification capabilities, fine-grained error localization, and dynamic region-level self-correction. This verifier supports general visual verification tasks and can also empower generation systems (e.g., M1-TTS) to improve output quality, providing a foundation for the reliable deployment of multimodal models.