Reading

Intelligent Classification of IT Work Orders: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

A complete IT support work order classification project comparing the traditional TF-IDF+MLP baseline model with the BERT fine-tuning scheme. Through structured hyperparameter search and detailed performance analysis, it demonstrates the trade-offs between classical methods and deep learning in text classification tasks.

文本分类BERTTF-IDF工单分类PyTorchTransformerITSM超参数优化机器学习自然语言处理

Published 2026-06-04 03:44Recent activity 2026-06-04 03:54Estimated read 7 min

Intelligent Classification of IT Work Orders: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

Section 01

[Introduction] Core Summary of Intelligent IT Work Order Classification: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

This project addresses the work order classification problem in the IT Service Management (ITSM) domain, comparing the traditional TF-IDF+MLP baseline model with the BERT fine-tuning scheme. Through structured hyperparameter search and multi-dimensional performance analysis, it demonstrates the trade-offs between classical methods and deep learning in terms of performance, cost, interpretability, etc., providing a reference for model selection in real business scenarios.

Section 02

Project Background and Dataset Feature Analysis

Task and Constraints

Input is the text of IT work orders; output is four categories: Incident/Request/Problem/Change. Only the text body is used; data is imbalanced (Change category accounts for 10.8%); reproducibility is ensured.

Dataset Features

Total of 11921 samples; after cleaning, body and type fields are retained. Since the Change category is a minority class in the distribution, macro F1 is used as the main evaluation metric.

Section 03

Design and Implementation of Two Classification Schemes

Scheme 1: TF-IDF+MLP Baseline

Design philosophy: First establish a lightweight and interpretable baseline, then upgrade after exploring its potential
Hyperparameter search: 6 configurations compared; the best is 15k features + wide network (512/256/128 neurons)
Model architecture: TF-IDF vector → three fully connected layers (with Dropout) → Softmax output

Scheme 2: BERT Fine-tuning

Pre-trained model: bert-base-uncased
Experimental configuration: CUDA training, batch size 16, maximum sequence length 256, early stopping at the 7th epoch
Model architecture: BERT encoder → classification head (4-class output)

Section 04

Performance Comparison and In-depth Analysis

Performance Data

Model	Validation macro F1	Test macro F1
TF-IDF+MLP	82.60%	—
BERT	86.73%	83.67%

Classification Details

BERT achieves an F1 score of 0.94 for the Change category on the test set, while only 0.65 for the Problem category (due to ambiguous definitions); Top-3 accuracy is 99.96%, supporting hybrid workflows.

Overfitting and Manual Testing

BERT's validation performance is best at the 5th epoch; overfitting occurs in later training stages. In manual test cases, the model accurately understands business semantics.

Section 05

Scheme Trade-offs and Decision Recommendations

Performance vs. Cost Comparison

Dimension	TF-IDF+MLP	BERT
macro F1	82.60%	86.73%
Parameter scale	~100k	109M
Inference cost	Extremely low	High

Scenario Selection

TF-IDF+MLP: Resource-constrained environments, need for interpretability, 82% performance meets requirements
BERT: Pursuit of extreme performance, sufficient GPU resources, need for Top-3 recommendations

Section 06

Technical Implementation Highlights and Engineering Practices

Key Technologies

Early stopping mechanism: Prevents overfitting based on validation macro F1
Stratified sampling: Ensures consistent category proportions across training/validation/test sets
Transparent documentation: Explains BERT loading warnings to help users understand

Reproducibility

Environment configuration is clear (Python3.x, PyTorch, etc.). A Colab notebook is provided for quick execution, and output files include model weights and datasets.

Section 07

Limitations and Future Improvement Directions

Current Limitations

Only supports English, assumes short texts, static categories, Problem category performance needs improvement

Future Improvements

Multilingual support (mBERT/XLM-R), incremental learning, active learning, ensemble methods, domain pre-training

Section 08

Project Summary and Insights

This project is a textbook-level text classification practice, adhering to the 'baseline first' methodology. Through systematic experiments, it demonstrates the value of simple methods and deep learning. Core insight: Good experimental design is more important than complex models; performance and cost must be balanced according to business needs. It has important reference significance for text classification engineers and researchers.