Section 01
[Introduction] Practical Exploration of Multimodal Emotion Recognition with ResNet-50 and CLIP Fusion
This article introduces a multimodal emotion recognition framework combining ResNet-50 visual features and CLIP text embeddings, using a late fusion strategy, which provides a practical reference for cross-modal learning. This project is a course project for HAICAI 2026, released by makisb on GitHub (link: https://github.com/makisb/multimodal-emotion-recognition). The core idea is to use a dual-branch model to process visual and text information separately, then perform weighted fusion to explore the path of multimodal emotion recognition.