Section 01
[Introduction] Research on Sensitive Information Detection in Pre-trained Corpora of Japanese Large Language Models (arXiv 2026)
This study is the first to systematically explore the detection of Special Care Personal Information (SCPI) in pre-trained corpora of Japanese large language models, filling the gap in this field. The research uses large model-assisted annotation to build datasets and train classifiers adapted to Japanese characteristics, providing important technical support for privacy compliance and data security of Japanese large language models. The original paper is from the arXiv platform, published on June 10, 2026, link: http://arxiv.org/abs/2606.12114v1.