Reading

Multi-label Text Sentiment Classification: Comparative Experiments of Five Machine Learning Models Based on the GoEmotions Dataset

A multi-label sentiment classification course project by a Vietnamese student team, comparing the performance of five algorithms (Logistic Regression, LinearSVC, Random Forest, 1D CNN, and Bi-LSTM) on Google's GoEmotions dataset

多标签分类情感分析GoEmotionsNLP机器学习深度学习Bi-LSTMCNN随机森林文本分类

Published 2026-06-03 07:45Recent activity 2026-06-03 07:48Estimated read 5 min

Multi-label Text Sentiment Classification: Comparative Experiments of Five Machine Learning Models Based on the GoEmotions Dataset

Section 01

Comparison of Multi-label Sentiment Classification Models: Experimental Study by a Vietnamese Student Team Based on GoEmotions

A course project by a Vietnamese student team, focusing on the multi-label text sentiment classification task. It compares the performance of five algorithms (Logistic Regression, LinearSVC, Random Forest, 1D CNN, and Bi-LSTM) on Google's GoEmotions dataset, and discusses technical challenges and optimization strategies in multi-label scenarios.

Section 02

Project Background and Challenges of Multi-label Sentiment Classification

In the field of natural language processing, sentiment analysis has evolved from binary classification to multi-dimensional recognition. Google released the GoEmotions dataset in 2021, which contains 58,000 Reddit comments labeled with 28 fine-grained emotion categories. Multi-label classification faces unique challenges: a text may carry multiple emotion labels simultaneously, and label sparsity and co-occurrence patterns make it difficult to directly apply traditional methods.

Section 03

System Architecture and Experimental Methods

The project uses an end-to-end pipeline architecture:

Preprocessing: Lowercase conversion, special character removal, tokenization, lemmatization;
Word vectors: TF-IDF (for traditional models), Word2Vec (for deep learning models);
Classification models: Traditional methods (Logistic Regression OVR, LinearSVC OVR, Random Forest), deep learning methods (1D CNN, Bi-LSTM);
Threshold optimization: Independently tune thresholds for each label to maximize F1 score.

Section 04

Experimental Results and Analysis of Model Characteristics

Quantitative Evaluation: The dataset has extreme class imbalance. High-frequency categories (e.g., amusement, gratitude, love, neutral) perform well, while low-frequency categories (e.g., sadness, pride) and categories with ambiguous semantic boundaries are prone to confusion; Qualitative Comparison:

Logistic Regression/LinearSVC: Perform well in linear scenarios but are weak in handling non-linear semantic combinations;
Random Forest: Robust in handling contradictory emotions;
1D CNN: Strong at local feature extraction, excellent performance on short texts;
Bi-LSTM: Strong ability to maintain long-distance semantic dependencies, suitable for complex mixed emotions.

Section 05

Practical Testing and Technical Implementation

Practical Testing: Four challenging use cases (coexistence of multiple emotions, contradictory semantics, clear features, complex mixing) were designed to simulate real scenarios; Technical Implementation: Google Colab platform was used, with code organized in Jupyter Notebook, supporting GPU acceleration and real-time demonstration. The code is modular for easy reproduction.

Section 06

Project Insights and Extended Reflections

Reference value of the project for industrial-grade systems:

Threshold tuning is a necessary engineering practice in multi-label scenarios;
Model selection needs to balance scenario requirements (CNN is suitable for short texts, LSTM for long texts, traditional methods have strong interpretability);
Data quality is more important than model complexity; This project provides a complete learning example of multi-label classification for beginners.

Multi-label Text Sentiment Classification: Comparative Experiments of Five Machine Learning Models Based on the GoEmotions Dataset

Comparison of Multi-label Sentiment Classification Models: Experimental Study by a Vietnamese Student Team Based on GoEmotions

Project Background and Challenges of Multi-label Sentiment Classification

System Architecture and Experimental Methods

Experimental Results and Analysis of Model Characteristics

Practical Testing and Technical Implementation

Project Insights and Extended Reflections

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization

Building an Enterprise-Grade Real-Time MLOps Platform: A Complete Practice from Automated Training to Continuous Deployment