Reading

Cross-Domain Sentiment Analysis: When DistilBERT Meets TF-IDF, Are Large Models Always Better?

A comparative study reveals a counterintuitive finding: in cross-domain scenarios, the simple TF-IDF + Logistic Regression model performs almost on par with DistilBERT, while the performance decay rate of expensive Transformer models during domain transfer is 2.4 times that of traditional methods.

Sentiment AnalysisCross-DomainDistilBERTTF-IDFLogistic RegressionDomain ShiftTransformerMachine LearningNLP

Published 2026-05-30 11:14Recent activity 2026-05-30 11:24Estimated read 6 min

Cross-Domain Sentiment Analysis: When DistilBERT Meets TF-IDF, Are Large Models Always Better?

Section 01

Cross-Domain Sentiment Analysis: DistilBERT vs TF-IDF+LR, Are Large Models Always Better?

This article explores the performance comparison between classic methods and modern large models in cross-domain sentiment analysis scenarios. Key findings: In the Twitter→IMDB cross-domain transfer, TF-IDF + Logistic Regression performs almost on par with DistilBERT, while the performance decay rate of Transformer models is 2.4 times that of traditional methods. Original author: aarogyaojha, Source: GitHub (link: https://github.com/aarogyaojha/sentiment_analysis), Publication date: May 30, 2026.

Section 02

Research Background and Motivation

The research stems from a practical problem: how to choose a model when it needs to run outside the distribution of training data? Two methods are compared: classic (TF-IDF + Logistic Regression) and modern (DistilBERT). Core question: Do the advantages of large models persist in domain transfer scenarios?

Section 03

Experimental Design Details

Dataset Configuration: The training set is Sentiment140 tweets (1.6 million for DistilBERT, 160,000 for TF-IDF + LR); the test set is IMDB movie reviews (25,000), forming the Twitter→IMDB cross-domain scenario. Evaluation Metrics: Accuracy, precision, recall, F1, and McNemar's test and chi-square test are used to verify significance.

Section 04

Intra-Domain vs Cross-Domain Performance Comparison

Intra-Domain Performance: On the Twitter dataset, DistilBERT achieves an accuracy of 85.0%, while TF-IDF + LR is 77.7%, leading by 7.3 percentage points (p < 0.001). Cross-Domain Performance: After transferring to IMDB, TF-IDF + LR has an accuracy of 72.3%, and DistilBERT 71.9%, which is statistically equivalent (chi-square value = 1.056, p = 0.304). Performance Decay: DistilBERT's accuracy drops by 13.1%, TF-IDF + LR by 5.4%, so the decay rate is 2.4 times.

Section 05

Reasons for Faster Decay of Large Models

The decay is mainly "precision-dominated". It is speculated that DistilBERT over-relies on local positive markers (emojis, slang) in Twitter, which have different meanings in movie reviews, leading to more false positives and impairing precision. In contrast, TF-IDF + LR has a more conservative decision boundary, relies less on local features, and is more robust across domains.

Section 06

Practical Implications

Intra-domain accuracy is not sufficient to guide model selection; 2. It is recommended to use the "precision-recall decay ratio" as a cross-domain diagnostic indicator; 3. If cross-domain performance is equivalent, expensive models may not be the best choice; 4. In critical applications, robustness takes priority over peak performance.

Section 07

Limitations and Future Directions

Limitations: Only targeted at sentiment analysis and the Twitter→IMDB scenario; results may differ for other tasks or more drastic transfers. Future Directions: Explore whether domain adaptation techniques can narrow the gap, whether larger models (GPT-4, Llama) have similar patterns, and whether multi-task learning can improve cross-domain robustness.

Section 08

Research Summary

This study breaks the "bigger is better" intuition: in cross-domain deployment or resource-constrained scenarios, TF-IDF + Logistic Regression may be underestimated. Model selection should consider the characteristics of the deployment environment rather than blindly pursuing new architectures. Models with significant intra-domain advantages may lose their advantages or even be inferior to simple baselines in cross-domain scenarios.

Cross-Domain Sentiment Analysis: When DistilBERT Meets TF-IDF, Are Large Models Always Better?

Cross-Domain Sentiment Analysis: DistilBERT vs TF-IDF+LR, Are Large Models Always Better?

Research Background and Motivation

Experimental Design Details

Intra-Domain vs Cross-Domain Performance Comparison

Reasons for Faster Decay of Large Models

Practical Implications

Limitations and Future Directions

Research Summary

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Building an Enterprise-Grade Real-Time MLOps Platform: A Complete Practice from Automated Training to Continuous Deployment

The 'Eureka' Phenomenon in Neural Networks: A Deep Analysis and Visual Exploration of Grokking