Section 01
[Main Floor] Core Results of the Multimodal Real Estate Valuation Model Integrating CLIP Visual Features
This paper proposes a multimodal real estate valuation model that integrates traditional tabular data with visual features extracted zero-shot by the CLIP model. It achieves performance significantly superior to pure tabular baselines on 730 real estate data samples from Gijón, Spain. The core innovation lies in using CLIP's zero-shot capability to capture visual information such as decoration and lighting from property photos, providing more comprehensive feature support for real estate valuation.