Section 01
MLLM-HSGG Dataset: Multimodal Large Language Model-Enhanced High-Information Scene Graph Generation
This article introduces the MLLM-HSGG dataset, which uses multimodal large language models (MLLMs) to enhance the scene graph generation task, aiming to provide richer structured information representation for visual understanding. Its core features include multimodal fusion, high information density, and quality improvement, driving the development of the scene graph generation field through innovative technical methods.