Zing Forum

Reading

PII Data Desensitization Practice: Comparison of Fine-tuned BERT and Zero-shot LLM Dual-track Solutions

This article introduces a complete Personal Identifiable Information (PII) detection and desensitization system. By comparing two technical approaches—fine-tuned BERT model and zero-shot LLM prompt engineering—it demonstrates how to achieve high-precision automatic recognition and desensitization of names and email addresses in real-world scenarios.

PII数据脱敏BERT命名实体识别LLM零样本学习隐私保护NLP
Published 2026-04-17 20:40Recent activity 2026-04-17 20:48Estimated read 6 min
PII Data Desensitization Practice: Comparison of Fine-tuned BERT and Zero-shot LLM Dual-track Solutions
1

Section 01

Introduction: Practice of Comparing Dual-track PII Data Desensitization Solutions

This article introduces a complete PII detection and desensitization system, comparing two technical approaches—fine-tuned BERT model and zero-shot LLM prompt engineering. It shows how to achieve high-precision recognition and desensitization of names and email addresses in real-world scenarios, providing engineering practice references for PII desensitization.

2

Section 02

Background and Problem Definition

Personal Identifiable Information (PII) includes data that can identify individuals, such as names, emails, and phone numbers. It needs to be automatically desensitized in scenarios like log analysis, customer service records, and dataset publishing. Traditional rule-based methods have poor performance in name recognition, and manual review cannot handle large-scale data, making deep learning solutions the mainstream choice.

3

Section 03

Detailed Explanation of Dual-track Technical Solutions

Fine-tuned BERT Model

Based on bert-base-uncased fine-tuning, trained using the WikiNeural dataset, with synthetic email data augmentation (samples expanded from 28516 to 37205). Defined 5 label categories (O/B-PER/I-PER/B-EMAIL/I-EMAIL). Training configuration: 3 epochs, learning rate 2e-5, batch size 8, weight decay 0.01.

Zero-shot LLM Prompt Engineering

Selected the Qwen2.5-1.5B-Instruct model, achieved structured JSON output through few-shot prompting to avoid hallucination issues. Post-processing includes hallucination filtering, email repair, and regex fallback.

4

Section 04

Core Technical Innovations

  1. Hybrid Inference Pipeline: The BERT solution uses a layered strategy of regex + neural network, balancing the determinism of rules and the generalization ability of the model;
  2. Intelligent Tokenization Handling: Solves the problem of BERT subword tokenization breaking entity boundaries, ensuring alignment between labels and tokens;
  3. Robustness Enhancement: The BERT side has confidence filtering and label correction; the LLM side has hallucination detection and text replacement mechanisms.
5

Section 05

Comparative Analysis of Experimental Results

Fine-tuned BERT Performance

  • 99.53% accuracy (token-level), 96.98% precision, 97.31% recall, 97.15% F1 (entity-level), 0.25% false positive rate, 1.36% missing rate.

Zero-shot LLM Performance

Metric Name (Strict) Name (Partial) Email
Precision 82.93% 86.99% 83.93%
Recall 51.78% 52.71% 100%
F1 63.75% 65.64% 91.26%

Comprehensive Comparison

Dimension Fine-tuned BERT Zero-shot LLM
Name F1 97.15% 65.64%
Email F1 >99% 91.26%
Requires Training Yes (7 mins) No
Inference Speed Fast (~15 samples/sec) Slow (~1 sample/sec)
Adaptability Needs retraining High
Hallucination Risk None Mitigated
6

Section 06

Error Pattern Analysis

Fine-tuned BERT Errors

  1. False positives for common words (e.g., "No" misjudged as name); 2. Sensitivity to tokenization; 3. Missing unseen naming patterns.

Zero-shot LLM Errors

  1. Low recall rate for names; 2. Inaccurate entity boundary recognition; 3. Confusion of email components; 4. Over-identification of non-name entities.
7

Section 07

Key Engineering Practice Points and Future Optimization

Engineering Practice

  • Data preparation: Data augmentation via command line (python main.py augment --email-ratio 0.5);
  • Training evaluation: Automated workflow (python main.py train/evaluate);
  • Production inference: Supports command line invocation (python main.py predict).

Future Directions

Hybrid system, constrained decoding, model upgrade (DeBERTa-v3), probability calibration, diverse email generation, active learning.

8

Section 08

Summary of Practical Application Value

The project provides a complete technical selection and implementation reference for PII desensitization: choose fine-tuned BERT for precision, zero-shot LLM for rapid validation. The code repository has a clear structure, suitable as a practical textbook for NER and desensitization technologies.