arXiv Open Access 2025

Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition

Junhong Ye Xu Yuan Xinying Qiu

Lihat Sumber

Abstrak

Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-domain data fusion, and sample-efficient learning for PII recognition. Using annotated corpora from healthcare (I2B2), legal (TAB), and biography (Wikipedia), we evaluate models across four dimensions: in-domain performance, cross-domain transferability, fusion, and few-shot learning. Results show legal-domain data transfers well to biographical texts, while medical domains resist incoming transfer. Fusion benefits are domain-specific, and high-quality recognition is achievable with only 10% of training data in low-specialization domains.

Topik & Kata Kunci

cs.CL

Penulis (3)

Junhong Ye

Xu Yuan

Xinying Qiu

Format Sitasi

APA MLA BibTeX

Ye, J., Yuan, X., Qiu, X. (2025). Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition. https://arxiv.org/abs/2507.11862

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓