Section 01
[Introduction] Innovative Application of Multimodal Vision-Language Models in Building Entrance Detection
This article introduces a multimodal building entrance detection system that integrates aerial imagery, street view images, GPS trajectories, and geospatial data. By fine-tuning vision-language models using LoRA and DoRA technologies, it achieves accurate spatial reasoning and positioning, addressing the problem of limited detection accuracy in traditional single-data-source methods. It has practical value in scenarios such as intelligent navigation and emergency rescue.