Semantic Scholar Open Access 2019 102 sitasi

Tuning Multilingual Transformers for Language-Specific Named Entity Recognition

Mikhail Arkhipov M. Trofimova Yuri Kuratov A. Sorokin

Abstrak

Our paper addresses the problem of multilingual named entity recognition on the material of 4 languages: Russian, Bulgarian, Czech and Polish. We solve this task using the BERT model. We use a hundred languages multilingual model as base for transfer to the mentioned Slavic languages. Unsupervised pre-training of the BERT model on these 4 languages allows to significantly outperform baseline neural approaches and multilingual BERT. Additional improvement is achieved by extending BERT with a word-level CRF layer. Our system was submitted to BSNLP 2019 Shared Task on Multilingual Named Entity Recognition and demonstrated top performance in multilingual setting for two competition metrics. We open-sourced NER models and BERT model pre-trained on the four Slavic languages.

Topik & Kata Kunci

Penulis (4)

M

Mikhail Arkhipov

M

M. Trofimova

Y

Yuri Kuratov

A

A. Sorokin

Format Sitasi

Arkhipov, M., Trofimova, M., Kuratov, Y., Sorokin, A. (2019). Tuning Multilingual Transformers for Language-Specific Named Entity Recognition. https://doi.org/10.18653/v1/W19-3712

Akses Cepat

Lihat di Sumber doi.org/10.18653/v1/W19-3712
Informasi Jurnal
Tahun Terbit
2019
Bahasa
en
Total Sitasi
102×
Sumber Database
Semantic Scholar
DOI
10.18653/v1/W19-3712
Akses
Open Access ✓