Reading

MCHPM: A Multimodal Cue-Based E-Commerce Review Helpfulness Prediction Model

Integrating the ELM theory from consumer psychology with deep learning, this model achieves more accurate review helpfulness prediction by simultaneously modeling central and peripheral cues from both text and images.

电商评论有用性多模态消费者心理学ELM模型BERTVGG注意力机制

Published 2026-04-28 20:03Recent activity 2026-04-28 20:22Estimated read 7 min

Section 01

Introduction / Main Floor: MCHPM: A Multimodal Cue-Based E-Commerce Review Helpfulness Prediction Model

Section 02

Research Background and Problem Definition

On e-commerce platforms, user reviews are important references for consumer decision-making. However, faced with massive amounts of review information, identifying which reviews are truly valuable references has become an urgent problem to solve. Traditional review helpfulness prediction models mainly rely on deep semantic representations, evaluating helpfulness by analyzing the content of review texts and accompanying images. However, this approach has an obvious blind spot: it ignores surface-level cues such as text readability, emotional intensity, and image clarity. The MCHPM (Multimodal Cue-based Helpfulness Prediction Model) project is an innovative solution proposed to fill this gap.

Section 03

Theoretical Foundation: Elaboration Likelihood Model

The design of MCHPM is inspired by the Elaboration Likelihood Model (ELM) in consumer psychology. This model describes two parallel paths for information receivers to process information: the Central Route and the Peripheral Route. The Central Route is based on careful cognitive engagement, where the audience deeply thinks about the content and quality of the information; the Peripheral Route is based on surface heuristics, where the audience relies on simple cues to make quick judgments. MCHPM cleverly transforms this theoretical framework into a computational model, modeling both information processing methods simultaneously.

Section 04

Model Architecture Design

MCHPM adopts a three-stage modular architecture to achieve systematic integration of multimodal cues:

Section 05

Stage 1: Cue Extraction

For each modality (text and image), the model extracts both central and peripheral cues simultaneously. Central cues represent deep semantic representations: on the text side, the BERT model is used to extract [CLS] embedding vectors; on the image side, the VGG-16 network is used to extract activation features from the fc2 layer. Peripheral cues capture surface-level features: on the text side, these include polarity, subjectivity, readability, and extremeness indicators; on the image side, they cover visual attributes such as brightness, contrast, saturation, and edge strength. This dual-track parallel design ensures that the model can comprehensively capture various factors affecting review helpfulness.

Section 06

Stage 2: Intra-Modality Co-Attention

Within each modality, central and peripheral cues interact through a co-attention mechanism. Specifically, the central representation queries the peripheral representation, and the peripheral representation also queries the central representation; the two attention-weighted outputs are fused via element-wise multiplication. This design simulates the cognitive process of humans when reading reviews: focusing on both what the review says (central) and how it says it (peripheral). The same pattern is applied independently to both text and image sides to generate modality-specific integrated vectors.

Section 07

Stage 3: Gated Multimodal Fusion

The text and image vectors integrated via co-attention first undergo nonlinear transformation through a tanh projection layer, then are input into the Gated Multimodal Unit (GMU). The GMU uses a sigmoid gating mechanism to adaptively determine the weight contribution of the two modalities based on the current input. This dynamic fusion strategy allows the model to flexibly handle different types of reviews: higher weight is given to text for descriptive reviews, while the influence of images is enhanced for reviews rich in visual information.

Section 08

Prediction Target and Evaluation

The model defines review helpfulness as a continuous variable, using the logarithmically transformed number of helpful votes as the regression target: log(1 + helpful_vote). This design not only considers the skewed distribution characteristics of helpful votes but also preserves the information of reviews with zero votes. In terms of evaluation, the project uses multiple metrics to comprehensively measure model performance, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).

MCHPM: A Multimodal Cue-Based E-Commerce Review Helpfulness Prediction Model

Introduction / Main Floor: MCHPM: A Multimodal Cue-Based E-Commerce Review Helpfulness Prediction Model

Research Background and Problem Definition

Theoretical Foundation: Elaboration Likelihood Model

Model Architecture Design

Stage 1: Cue Extraction

Stage 2: Intra-Modality Co-Attention

Stage 3: Gated Multimodal Fusion

Prediction Target and Evaluation

Continue Reading

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

LLM-assisted-analysis: A New Approach to Detecting Logical Vulnerabilities in Smart Contracts Using Large Language Models

Building Modern LLM from Scratch: A Tutorial-level Implementation of Llama-style Language Model