Zing Forum

Reading

Multimodal Retail Decision Intelligence: A New Paradigm for Recommendation Systems Integrating Graph Neural Networks and Large Language Models

This master's thesis research project explores integrating transaction data, product metadata, text reviews, and product images into a unified graph neural network framework, combining the semantic understanding capabilities of large language models to improve the accuracy of retail recommendations, demand forecasting, and customer behavior analysis.

多模态学习图神经网络大语言模型零售推荐需求预测客户行为分析可解释AIGNNLLM推荐系统
Published 2026-05-17 00:36Recent activity 2026-05-17 00:51Estimated read 8 min
Multimodal Retail Decision Intelligence: A New Paradigm for Recommendation Systems Integrating Graph Neural Networks and Large Language Models
1

Section 01

[Overview] Multimodal Retail Decision Intelligence: A New Recommendation Paradigm Integrating GNN and LLM

This master's thesis research project explores integrating transaction data, product metadata, text reviews, and product images into a unified graph neural network (GNN) framework, combining the semantic understanding capabilities of large language models (LLM) to improve the accuracy of retail recommendations, demand forecasting, and customer behavior analysis, as well as provide interpretable decision-making basis.

2

Section 02

Research Background and Motivation

Retail industry data exhibits multimodal characteristics: transaction records are structured data, product descriptions are text, user reviews contain emotional information, and product images provide visual features. Traditional recommendation systems often only use partial data types, making it difficult to explore complex relationships between data. The goal of this project is to integrate GNN, LLM, and multimodal embedding technologies to build an intelligent retail decision support system, improving performance and providing interpretability.

3

Section 03

Core Research Questions

The project focuses on the following objectives:

  1. Constructing multimodal retail knowledge representation: How to uniformly represent transaction data, product metadata, text reviews, and product images?
  2. Learning graph structure relationships: How to capture complex relationships between entities such as users, products, and categories?
  3. Integrating LLM semantic understanding: How to use LLM to enhance the semantic understanding of text data?
  4. Improving recommendation and prediction performance: Can multimodal fusion improve recommendation accuracy and demand forecasting?
  5. Providing interpretable outputs: How to make the AI decision-making process understandable to humans?
4

Section 04

Technical Architecture Overview

Multimodal Data Fusion

Integrate five types of data sources: transaction data (purchase history, timestamps, etc.), product metadata (category, brand, etc.), text reviews (user evaluations), product images (visual features), and graph relationships (user-product interactions, etc.).

GNN Modeling

Nodes include entities such as users, products, and categories; edges represent relationships like purchase, browsing, and similarity. Aggregate neighbor information through message-passing mechanisms to learn high-order graph structure features.

LLM Enhancement

Its roles include: generating text embeddings (product descriptions, user reviews), reasoning to supplement the deficiencies of structured data, and automatically generating natural language explanations for recommendation reasons.

5

Section 05

Research Methodology

Adopt a modular process:

  • RQ0 Data Preparation: Clean and align data, using three public datasets: RetailRocket, Amazon Product Data, and Instacart Market Basket.
  • RQ1 Multimodal Embedding: Explore text embedding (SentenceTransformers), image feature extraction, structured data encoding, and fusion strategies.
  • RQ2 Graph Construction: Define node types and edge relationships, and build a retail knowledge graph.
  • RQ3 GNN Modeling: Experiment with architectures like GCN, GAT, and GraphSAGE.
  • RQ4 LLM Reasoning: Research prompt engineering, chain-of-thought, and other techniques to integrate LLM.
  • RQ5 Interpretability Analysis: Generate human-understandable explanations and evaluate their quality.
  • RQ6 Performance Evaluation: Evaluate metrics such as recommendation accuracy, prediction precision, and computational efficiency.
6

Section 06

Technology Stack and Experimental Environment

Technology Stack

Use Python ecosystem tools: PyTorch (deep learning), PyTorch Geometric (GNN), Scikit-learn/XGBoost (traditional ML), Transformers/Hugging Face (LLM), SentenceTransformers (text embedding), Pandas/NumPy/Dask (data processing), Matplotlib, etc. (visualization).

Experimental Environment

The development environment is Apple Mac Mini M4 (24GB RAM, macOS). Ensure experimental reproducibility through fixed random seeds, modular notebooks, and version control.

7

Section 07

Project Contributions and Value

Key Contributions:

  1. Methodological Innovation: Propose a retail decision intelligence framework integrating GNN, LLM, and multimodal embedding.
  2. System Implementation: Provide open-source implementation (data processing, model training, evaluation workflow).
  3. Experimental Validation: Verify the effectiveness of the method on multiple public datasets.
  4. Interpretability: Explore the interpretability of AI decisions to enhance user trust.

Industry Value: It is expected to improve the personalization of recommendations and the accuracy of demand forecasting, providing comprehensive data support for business decisions.

8

Section 08

Open Source and Academic Standards

The project follows best practices for academic open source:

  • Provide complete citation information for easy reference.
  • Use public datasets to ensure result reproducibility.
  • Modular design for easy expansion and modification.
  • Detailed documentation and code comments. An open attitude promotes knowledge sharing and technological progress in the retail AI field.