Zing Forum

Reading

Multi-label Text Sentiment Classification: Comparative Experiments of Five Machine Learning Models Based on the GoEmotions Dataset

A multi-label sentiment classification course project by a Vietnamese student team, comparing the performance of five algorithms (Logistic Regression, LinearSVC, Random Forest, 1D CNN, and Bi-LSTM) on Google's GoEmotions dataset

多标签分类情感分析GoEmotionsNLP机器学习深度学习Bi-LSTMCNN随机森林文本分类
Published 2026-06-03 07:45Recent activity 2026-06-03 07:48Estimated read 5 min
Multi-label Text Sentiment Classification: Comparative Experiments of Five Machine Learning Models Based on the GoEmotions Dataset
1

Section 01

Comparison of Multi-label Sentiment Classification Models: Experimental Study by a Vietnamese Student Team Based on GoEmotions

A course project by a Vietnamese student team, focusing on the multi-label text sentiment classification task. It compares the performance of five algorithms (Logistic Regression, LinearSVC, Random Forest, 1D CNN, and Bi-LSTM) on Google's GoEmotions dataset, and discusses technical challenges and optimization strategies in multi-label scenarios.

2

Section 02

Project Background and Challenges of Multi-label Sentiment Classification

In the field of natural language processing, sentiment analysis has evolved from binary classification to multi-dimensional recognition. Google released the GoEmotions dataset in 2021, which contains 58,000 Reddit comments labeled with 28 fine-grained emotion categories. Multi-label classification faces unique challenges: a text may carry multiple emotion labels simultaneously, and label sparsity and co-occurrence patterns make it difficult to directly apply traditional methods.

3

Section 03

System Architecture and Experimental Methods

The project uses an end-to-end pipeline architecture:

  1. Preprocessing: Lowercase conversion, special character removal, tokenization, lemmatization;
  2. Word vectors: TF-IDF (for traditional models), Word2Vec (for deep learning models);
  3. Classification models: Traditional methods (Logistic Regression OVR, LinearSVC OVR, Random Forest), deep learning methods (1D CNN, Bi-LSTM);
  4. Threshold optimization: Independently tune thresholds for each label to maximize F1 score.
4

Section 04

Experimental Results and Analysis of Model Characteristics

Quantitative Evaluation: The dataset has extreme class imbalance. High-frequency categories (e.g., amusement, gratitude, love, neutral) perform well, while low-frequency categories (e.g., sadness, pride) and categories with ambiguous semantic boundaries are prone to confusion; Qualitative Comparison:

  • Logistic Regression/LinearSVC: Perform well in linear scenarios but are weak in handling non-linear semantic combinations;
  • Random Forest: Robust in handling contradictory emotions;
  • 1D CNN: Strong at local feature extraction, excellent performance on short texts;
  • Bi-LSTM: Strong ability to maintain long-distance semantic dependencies, suitable for complex mixed emotions.
5

Section 05

Practical Testing and Technical Implementation

Practical Testing: Four challenging use cases (coexistence of multiple emotions, contradictory semantics, clear features, complex mixing) were designed to simulate real scenarios; Technical Implementation: Google Colab platform was used, with code organized in Jupyter Notebook, supporting GPU acceleration and real-time demonstration. The code is modular for easy reproduction.

6

Section 06

Project Insights and Extended Reflections

Reference value of the project for industrial-grade systems:

  • Threshold tuning is a necessary engineering practice in multi-label scenarios;
  • Model selection needs to balance scenario requirements (CNN is suitable for short texts, LSTM for long texts, traditional methods have strong interpretability);
  • Data quality is more important than model complexity; This project provides a complete learning example of multi-label classification for beginners.