DOAJ Open Access 2025

Label-Guided Data Augmentation for Chinese Named Entity Recognition

Miao Jiang Honghui Chen

Abstrak

Chinese named entity recognition (NER) is a fundamental natural language processing (NLP) task that involves identifying and categorizing entities in text. It plays a crucial role in applications such as information extraction, machine translation, and question-answering systems, enhancing the efficiency and accuracy of text processing and language understanding. However, existing methods for Chinese NER face challenges due to the disruption of character-level semantics in traditional data augmentation, leading to misaligned entity labels and reduced prediction accuracy. Moreover, the reliance on English-centric fine-grained annotated datasets and the simplistic concatenation of label semantic embeddings with original samples limits their effectiveness, particularly in addressing class imbalances in low-resource scenarios. To address these issues, we propose a novel Chinese NER model, LGDA, which leverages Label-Guided Data Augmentation to mitigate entity label misalignment and sample distribution imbalances. The LGDA model consists of three key components: a data augmentation module, a label semantic fusion module, and an optimized loss function. It operates in two stages: (1) the enhancement of data with a masked entity generation model and (2) the integration of label annotations to refine entity recognition. By employing twin encoders and a cross-attention mechanism, the model fuses sample and label semantics, while the optimized loss function adapts to class imbalances. Extensive experiments on two public datasets, OntoNotes 4.0 (Chinese) and MSRA, demonstrate the effectiveness of LGDA, achieving significant performance improvements over baseline models. Notably, the data augmentation module proves particularly effective in few-shot settings.

Penulis (2)

M

Miao Jiang

H

Honghui Chen

Format Sitasi

Jiang, M., Chen, H. (2025). Label-Guided Data Augmentation for Chinese Named Entity Recognition. https://doi.org/10.3390/app15052521

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.3390/app15052521
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.3390/app15052521
Akses
Open Access ✓