Zing Forum

Reading

Intelligent Classification of IT Work Orders: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

A complete IT support work order classification project comparing the traditional TF-IDF+MLP baseline model with the BERT fine-tuning scheme. Through structured hyperparameter search and detailed performance analysis, it demonstrates the trade-offs between classical methods and deep learning in text classification tasks.

文本分类BERTTF-IDF工单分类PyTorchTransformerITSM超参数优化机器学习自然语言处理
Published 2026-06-04 03:44Recent activity 2026-06-04 03:54Estimated read 7 min
Intelligent Classification of IT Work Orders: Comparative Practice of TF-IDF and BERT Dual-Track Schemes
1

Section 01

[Introduction] Core Summary of Intelligent IT Work Order Classification: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

This project addresses the work order classification problem in the IT Service Management (ITSM) domain, comparing the traditional TF-IDF+MLP baseline model with the BERT fine-tuning scheme. Through structured hyperparameter search and multi-dimensional performance analysis, it demonstrates the trade-offs between classical methods and deep learning in terms of performance, cost, interpretability, etc., providing a reference for model selection in real business scenarios.

2

Section 02

Project Background and Dataset Feature Analysis

Task and Constraints

Input is the text of IT work orders; output is four categories: Incident/Request/Problem/Change. Only the text body is used; data is imbalanced (Change category accounts for 10.8%); reproducibility is ensured.

Dataset Features

Total of 11921 samples; after cleaning, body and type fields are retained. Since the Change category is a minority class in the distribution, macro F1 is used as the main evaluation metric.

3

Section 03

Design and Implementation of Two Classification Schemes

Scheme 1: TF-IDF+MLP Baseline

  • Design philosophy: First establish a lightweight and interpretable baseline, then upgrade after exploring its potential
  • Hyperparameter search: 6 configurations compared; the best is 15k features + wide network (512/256/128 neurons)
  • Model architecture: TF-IDF vector → three fully connected layers (with Dropout) → Softmax output

Scheme 2: BERT Fine-tuning

  • Pre-trained model: bert-base-uncased
  • Experimental configuration: CUDA training, batch size 16, maximum sequence length 256, early stopping at the 7th epoch
  • Model architecture: BERT encoder → classification head (4-class output)
4

Section 04

Performance Comparison and In-depth Analysis

Performance Data

Model Validation macro F1 Test macro F1
TF-IDF+MLP 82.60%
BERT 86.73% 83.67%

Classification Details

BERT achieves an F1 score of 0.94 for the Change category on the test set, while only 0.65 for the Problem category (due to ambiguous definitions); Top-3 accuracy is 99.96%, supporting hybrid workflows.

Overfitting and Manual Testing

BERT's validation performance is best at the 5th epoch; overfitting occurs in later training stages. In manual test cases, the model accurately understands business semantics.

5

Section 05

Scheme Trade-offs and Decision Recommendations

Performance vs. Cost Comparison

Dimension TF-IDF+MLP BERT
macro F1 82.60% 86.73%
Parameter scale ~100k 109M
Inference cost Extremely low High

Scenario Selection

  • TF-IDF+MLP: Resource-constrained environments, need for interpretability, 82% performance meets requirements
  • BERT: Pursuit of extreme performance, sufficient GPU resources, need for Top-3 recommendations
6

Section 06

Technical Implementation Highlights and Engineering Practices

Key Technologies

  • Early stopping mechanism: Prevents overfitting based on validation macro F1
  • Stratified sampling: Ensures consistent category proportions across training/validation/test sets
  • Transparent documentation: Explains BERT loading warnings to help users understand

Reproducibility

Environment configuration is clear (Python3.x, PyTorch, etc.). A Colab notebook is provided for quick execution, and output files include model weights and datasets.

7

Section 07

Limitations and Future Improvement Directions

Current Limitations

Only supports English, assumes short texts, static categories, Problem category performance needs improvement

Future Improvements

Multilingual support (mBERT/XLM-R), incremental learning, active learning, ensemble methods, domain pre-training

8

Section 08

Project Summary and Insights

This project is a textbook-level text classification practice, adhering to the 'baseline first' methodology. Through systematic experiments, it demonstrates the value of simple methods and deep learning. Core insight: Good experimental design is more important than complex models; performance and cost must be balanced according to business needs. It has important reference significance for text classification engineers and researchers.