# MCHPM: A Multimodal Cue-Based E-Commerce Review Helpfulness Prediction Model

> Integrating the ELM theory from consumer psychology with deep learning, this model achieves more accurate review helpfulness prediction by simultaneously modeling central and peripheral cues from both text and images.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-28T12:03:22.000Z
- 最近活动: 2026-04-28T12:22:16.550Z
- 热度: 159.7
- 关键词: 电商, 评论有用性, 多模态, 消费者心理学, ELM模型, BERT, VGG, 注意力机制
- 页面链接: https://www.zingnex.cn/en/forum/thread/mchpm
- Canonical: https://www.zingnex.cn/forum/thread/mchpm
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: MCHPM: A Multimodal Cue-Based E-Commerce Review Helpfulness Prediction Model

Integrating the ELM theory from consumer psychology with deep learning, this model achieves more accurate review helpfulness prediction by simultaneously modeling central and peripheral cues from both text and images.

## Research Background and Problem Definition

On e-commerce platforms, user reviews are important references for consumer decision-making. However, faced with massive amounts of review information, identifying which reviews are truly valuable references has become an urgent problem to solve. Traditional review helpfulness prediction models mainly rely on deep semantic representations, evaluating helpfulness by analyzing the content of review texts and accompanying images. However, this approach has an obvious blind spot: it ignores surface-level cues such as text readability, emotional intensity, and image clarity. The MCHPM (Multimodal Cue-based Helpfulness Prediction Model) project is an innovative solution proposed to fill this gap.

## Theoretical Foundation: Elaboration Likelihood Model

The design of MCHPM is inspired by the Elaboration Likelihood Model (ELM) in consumer psychology. This model describes two parallel paths for information receivers to process information: the Central Route and the Peripheral Route. The Central Route is based on careful cognitive engagement, where the audience deeply thinks about the content and quality of the information; the Peripheral Route is based on surface heuristics, where the audience relies on simple cues to make quick judgments. MCHPM cleverly transforms this theoretical framework into a computational model, modeling both information processing methods simultaneously.

## Model Architecture Design

MCHPM adopts a three-stage modular architecture to achieve systematic integration of multimodal cues:

## Stage 1: Cue Extraction

For each modality (text and image), the model extracts both central and peripheral cues simultaneously. Central cues represent deep semantic representations: on the text side, the BERT model is used to extract [CLS] embedding vectors; on the image side, the VGG-16 network is used to extract activation features from the fc2 layer. Peripheral cues capture surface-level features: on the text side, these include polarity, subjectivity, readability, and extremeness indicators; on the image side, they cover visual attributes such as brightness, contrast, saturation, and edge strength. This dual-track parallel design ensures that the model can comprehensively capture various factors affecting review helpfulness.

## Stage 2: Intra-Modality Co-Attention

Within each modality, central and peripheral cues interact through a co-attention mechanism. Specifically, the central representation queries the peripheral representation, and the peripheral representation also queries the central representation; the two attention-weighted outputs are fused via element-wise multiplication. This design simulates the cognitive process of humans when reading reviews: focusing on both what the review says (central) and how it says it (peripheral). The same pattern is applied independently to both text and image sides to generate modality-specific integrated vectors.

## Stage 3: Gated Multimodal Fusion

The text and image vectors integrated via co-attention first undergo nonlinear transformation through a tanh projection layer, then are input into the Gated Multimodal Unit (GMU). The GMU uses a sigmoid gating mechanism to adaptively determine the weight contribution of the two modalities based on the current input. This dynamic fusion strategy allows the model to flexibly handle different types of reviews: higher weight is given to text for descriptive reviews, while the influence of images is enhanced for reviews rich in visual information.

## Prediction Target and Evaluation

The model defines review helpfulness as a continuous variable, using the logarithmically transformed number of helpful votes as the regression target: log(1 + helpful_vote). This design not only considers the skewed distribution characteristics of helpful votes but also preserves the information of reviews with zero votes. In terms of evaluation, the project uses multiple metrics to comprehensively measure model performance, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).
