arXiv Open Access 2025

Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition

Junhong Ye Xu Yuan Xinying Qiu
Lihat Sumber

Abstrak

Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-domain data fusion, and sample-efficient learning for PII recognition. Using annotated corpora from healthcare (I2B2), legal (TAB), and biography (Wikipedia), we evaluate models across four dimensions: in-domain performance, cross-domain transferability, fusion, and few-shot learning. Results show legal-domain data transfers well to biographical texts, while medical domains resist incoming transfer. Fusion benefits are domain-specific, and high-quality recognition is achievable with only 10% of training data in low-specialization domains.

Topik & Kata Kunci

Penulis (3)

J

Junhong Ye

X

Xu Yuan

X

Xinying Qiu

Format Sitasi

Ye, J., Yuan, X., Qiu, X. (2025). Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition. https://arxiv.org/abs/2507.11862

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓