Section 01
[Introduction] Multimodal Large Language Models Empower Scene Graph Generation: In-depth Analysis of the MLLM-HSGG Dataset
Scene Graph Generation (SGG) is a core task in the field of computer vision, aiming to extract structured semantic information from images. The rise of Multimodal Large Language Models (MLLMs) has brought new possibilities to SGG. The MLLM-HSGG dataset enhances the information density and quality of SGG through MLLMs, adopts human-machine collaborative annotation, supports multi-granularity descriptions, has application value in multiple fields such as image retrieval and visual question answering, and provides a new direction for breaking through the bottlenecks of traditional SGG.