arXiv Open Access 2022

An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Xue-Yong Fu Cheng Chen Md Tahmid Rahman Laskar Shashi Bhushan TN Simon Corston-Oliver
Lihat Sumber

Abstrak

We present a simple yet effective method to train a named entity recognition (NER) model that operates on business telephone conversation transcripts that contain noise due to the nature of spoken conversation and artifacts of automatic speech recognition. We first fine-tune LUKE, a state-of-the-art Named Entity Recognition (NER) model, on a limited amount of transcripts, then use it as the teacher model to teach a smaller DistilBERT-based student model using a large amount of weakly labeled data and a small amount of human-annotated data. The model achieves high accuracy while also satisfying the practical constraints for inclusion in a commercial telephony product: realtime performance when deployed on cost-effective CPUs rather than GPUs.

Topik & Kata Kunci

Penulis (5)

X

Xue-Yong Fu

C

Cheng Chen

M

Md Tahmid Rahman Laskar

S

Shashi Bhushan TN

S

Simon Corston-Oliver

Format Sitasi

Fu, X., Chen, C., Laskar, M.T.R., TN, S.B., Corston-Oliver, S. (2022). An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. https://arxiv.org/abs/2209.13736

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓