For each modality (text and image), the model extracts both central and peripheral cues simultaneously. Central cues represent deep semantic representations: on the text side, the BERT model is used to extract [CLS] embedding vectors; on the image side, the VGG-16 network is used to extract activation features from the fc2 layer. Peripheral cues capture surface-level features: on the text side, these include polarity, subjectivity, readability, and extremeness indicators; on the image side, they cover visual attributes such as brightness, contrast, saturation, and edge strength. This dual-track parallel design ensures that the model can comprehensively capture various factors affecting review helpfulness.