Section 01
Zero-Shot Multimodal Anomaly Detection: A Training-Free Industrial Quality Inspection Solution Combining OWL-ViT and SAM (Introduction)
This project proposes a training-free zero-shot multimodal anomaly detection system that combines OWL-ViT v2 open-vocabulary detection and SAM pixel-level segmentation to enable natural language querying and precise localization of industrial defects such as cracks, dents, and corrosion. The project is maintained by AC052001, and the source code is released on GitHub (link: https://github.com/AC052001/Zero-Shot-Multimodal-Anomaly-Detection-using-Vision-Language-Models). It was published on May 24, 2026.