章节 01
Project Overview: ResNet-50 + LSTM Image Captioning Model
This project implements a classic encoder-decoder image captioning model using pre-trained ResNet-50 for image feature extraction and LSTM for text generation. It achieves a BLEU-4 score of ~0.21 on the Flickr30k dataset. The project is an excellent starting point for learning multi-modal AI, covering core processes from data preprocessing to model evaluation.