Zing Forum

Reading

NLP-ReviewEngine: An Intelligent E-commerce Review Analysis System Based on Classic NLP Techniques

An end-to-end natural language processing pipeline specifically designed for analyzing customer reviews on e-commerce platforms. The system supports mixed English and Roman Urdu text, performing sentiment analysis, customer intent classification, and discovering hidden themes via NMF.

NLPsentiment-analysise-commercemachine-learningTF-IDFtopic-modelingmultilingualRoman-UrduVADERtext-classification
Published 2026-06-06 05:45Recent activity 2026-06-06 05:48Estimated read 6 min
NLP-ReviewEngine: An Intelligent E-commerce Review Analysis System Based on Classic NLP Techniques
1

Section 01

Introduction to NLP-ReviewEngine: An Intelligent E-commerce Review Analysis System Based on Classic NLP Techniques

Project Overview NLP-ReviewEngine is an end-to-end natural language processing pipeline specifically designed for analyzing customer reviews on e-commerce platforms. The system supports mixed English and Roman Urdu text, with core functions including sentiment analysis, customer intent classification, and discovering hidden themes via NMF. Source Information

2

Section 02

Project Background and Significance

With the rapid development of e-commerce, customer reviews have become key data for consumer decision-making and merchant improvement. However, manual analysis of massive multilingual reviews (especially mixed English and Urdu in the South Asian market) is time-consuming and struggles to capture deep patterns. NLP-ReviewEngine aims to address this pain point: it builds a complete NLP pipeline to automatically process mixed text, extract sentiment tendencies, identify customer intents, and discover hidden themes, providing a feasible solution for e-commerce intelligent customer service and data analysis.

3

Section 03

System Architecture and Core Technologies

Overall Architecture: Modular end-to-end design covering five components: data preprocessing, feature extraction, sentiment analysis, intent classification, and topic modeling. Text Preprocessing: 6-step process (lowercase conversion → URL removal → punctuation cleaning → tokenization → stopword filtering → lemmatization). Roman Urdu retains its original form (since the NLTK stopword library is English-biased). Feature Extraction: Comparison between Bag-of-Words (simple but no semantic relations) and TF-IDF (weighted to highlight key words, better performance in classification tasks); high-frequency feature words include product, quality, hai (Urdu for "is"), delivery, etc.

4

Section 04

Sentiment Analysis and Intent Recognition

Sentiment Analysis:

  • VADER Rule Engine: Designed for social media, handles emojis/slang. Test set accuracy: 65.45%, F1 score: 0.66 (precision for negative review recognition: 0.81; limited recognition for neutral/Roman Urdu text).
  • Logistic Regression Classifier: Based on TF-IDF features, expected F1 score of 0.85-0.91 on the full dataset. Intent Recognition: 4 intent categories (refund request, delivery issue, complaint feedback, general inquiry). For example, trigger words for refund requests include refund/money back/paisa wapas (Urdu for "money back"), and complaint feedback accounts for approximately 80 entries.
5

Section 05

Topic Modeling and Data Engineering

Topic Modeling: NMF discovers 5 potential themes (product quality, size fit, delivery logistics, return/refund, comprehensive evaluation), which align with core e-commerce concerns. Data Processing: 55 synthetic reviews are generated when real datasets are unavailable; anti-leakage mechanism: split into training/test sets (44/11) first, only the training set is duplicated 6 times (264 samples), and the test set remains unseen.

6

Section 06

Demo Interface and Application Prospects

Interactive Demo: Integrated with the Gradio framework, providing a web interface where users can input reviews in real time to view results for sentiment, intent, and themes. Application Prospects:

  • Multilingual Market Value: Suitable for code-mixed text scenarios in South Asia/Middle East.
  • Advantages: Outperforms LLMs in resource-constrained, high interpretability, or fast deployment scenarios.
  • Learning Example: Covers the full NLP pipeline, serving as a reference for NLP beginners and text classification learners.