Section 01
Introduction: DAMF Addresses VLM Fine-tuning Failure Under Extreme Physical Domain Transfer
This article focuses on the fine-tuning failure of vision-language models (e.g., BLIP) in extreme physical domain transfer (such as underwater image captioning) and proposes the two-stage optimization protocol DAMF. By isolating visual realignment and controlled multimodal coupling, this method nearly triples BLEU-4 scores in underwater image captioning tasks, and related results have been accepted by ECCV 2026.