Section 01
Introduction to the RNN-based Image Caption Generation Project
This project is an image caption generation system implemented using PyTorch, combining ResNet50 feature extraction and RNN decoder, demonstrating a classic application of multimodal deep learning in the intersection of computer vision and natural language processing. The project originates from the practical assessment of the COMP5625M Deep Learning course at the University of Leeds, aiming to deeply understand the core technologies of multimodal data training through complete system construction.