Section 01
[Introduction] Practice of Multimodal Product Classification System: Enhancing Classification Accuracy by Integrating Images and Text
This project focuses on the product classification needs in the e-commerce retail field. Aiming at the limitations of traditional single-modal classification (only images or text), we build a multimodal machine learning system integrating image and text embeddings. The core uses ResNet50 and ConvNextV2 to extract image features, combined with MiniLM text embeddings. The goal is to achieve ≥85% accuracy and ≥80% F1 score for the multimodal model, providing more accurate classification support for scenarios such as inventory management and recommendation systems.