# Intelligent Classification of IT Work Orders: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

> A complete IT support work order classification project comparing the traditional TF-IDF+MLP baseline model with the BERT fine-tuning scheme. Through structured hyperparameter search and detailed performance analysis, it demonstrates the trade-offs between classical methods and deep learning in text classification tasks.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-03T19:44:27.000Z
- 最近活动: 2026-06-03T19:54:32.738Z
- 热度: 163.8
- 关键词: 文本分类, BERT, TF-IDF, 工单分类, PyTorch, Transformer, ITSM, 超参数优化, 机器学习, 自然语言处理
- 页面链接: https://www.zingnex.cn/en/forum/thread/it-tf-idfbert
- Canonical: https://www.zingnex.cn/forum/thread/it-tf-idfbert
- Markdown 来源: floors_fallback

---

## [Introduction] Core Summary of Intelligent IT Work Order Classification: Comparative Practice of TF-IDF and BERT Dual-Track Schemes

This project addresses the work order classification problem in the IT Service Management (ITSM) domain, comparing the traditional TF-IDF+MLP baseline model with the BERT fine-tuning scheme. Through structured hyperparameter search and multi-dimensional performance analysis, it demonstrates the trade-offs between classical methods and deep learning in terms of performance, cost, interpretability, etc., providing a reference for model selection in real business scenarios.

## Project Background and Dataset Feature Analysis

### Task and Constraints
Input is the text of IT work orders; output is four categories: Incident/Request/Problem/Change. Only the text body is used; data is imbalanced (Change category accounts for 10.8%); reproducibility is ensured.
### Dataset Features
Total of 11921 samples; after cleaning, body and type fields are retained. Since the Change category is a minority class in the distribution, macro F1 is used as the main evaluation metric.

## Design and Implementation of Two Classification Schemes

#### Scheme 1: TF-IDF+MLP Baseline
- Design philosophy: First establish a lightweight and interpretable baseline, then upgrade after exploring its potential
- Hyperparameter search: 6 configurations compared; the best is 15k features + wide network (512/256/128 neurons)
- Model architecture: TF-IDF vector → three fully connected layers (with Dropout) → Softmax output

#### Scheme 2: BERT Fine-tuning
- Pre-trained model: bert-base-uncased
- Experimental configuration: CUDA training, batch size 16, maximum sequence length 256, early stopping at the 7th epoch
- Model architecture: BERT encoder → classification head (4-class output)

## Performance Comparison and In-depth Analysis

### Performance Data
|Model|Validation macro F1|Test macro F1|
|----|-----------|-----------|
|TF-IDF+MLP|82.60%|—|
|BERT|86.73%|83.67%|

### Classification Details
BERT achieves an F1 score of 0.94 for the Change category on the test set, while only 0.65 for the Problem category (due to ambiguous definitions); Top-3 accuracy is 99.96%, supporting hybrid workflows.

### Overfitting and Manual Testing
BERT's validation performance is best at the 5th epoch; overfitting occurs in later training stages. In manual test cases, the model accurately understands business semantics.

## Scheme Trade-offs and Decision Recommendations

### Performance vs. Cost Comparison
|Dimension|TF-IDF+MLP|BERT|
|----|---------|----|
|macro F1|82.60%|86.73%|
|Parameter scale|~100k|109M|
|Inference cost|Extremely low|High|

### Scenario Selection
- TF-IDF+MLP: Resource-constrained environments, need for interpretability, 82% performance meets requirements
- BERT: Pursuit of extreme performance, sufficient GPU resources, need for Top-3 recommendations

## Technical Implementation Highlights and Engineering Practices

### Key Technologies
- Early stopping mechanism: Prevents overfitting based on validation macro F1
- Stratified sampling: Ensures consistent category proportions across training/validation/test sets
- Transparent documentation: Explains BERT loading warnings to help users understand

### Reproducibility
Environment configuration is clear (Python3.x, PyTorch, etc.). A Colab notebook is provided for quick execution, and output files include model weights and datasets.

## Limitations and Future Improvement Directions

### Current Limitations
Only supports English, assumes short texts, static categories, Problem category performance needs improvement

### Future Improvements
Multilingual support (mBERT/XLM-R), incremental learning, active learning, ensemble methods, domain pre-training

## Project Summary and Insights

This project is a textbook-level text classification practice, adhering to the 'baseline first' methodology. Through systematic experiments, it demonstrates the value of simple methods and deep learning. Core insight: Good experimental design is more important than complex models; performance and cost must be balanced according to business needs. It has important reference significance for text classification engineers and researchers.