Tuning Multilingual Transformers for Language-Specific Named Entity Recognition
Abstrak
Our paper addresses the problem of multilingual named entity recognition on the material of 4 languages: Russian, Bulgarian, Czech and Polish. We solve this task using the BERT model. We use a hundred languages multilingual model as base for transfer to the mentioned Slavic languages. Unsupervised pre-training of the BERT model on these 4 languages allows to significantly outperform baseline neural approaches and multilingual BERT. Additional improvement is achieved by extending BERT with a word-level CRF layer. Our system was submitted to BSNLP 2019 Shared Task on Multilingual Named Entity Recognition and demonstrated top performance in multilingual setting for two competition metrics. We open-sourced NER models and BERT model pre-trained on the four Slavic languages.
Topik & Kata Kunci
Penulis (4)
Mikhail Arkhipov
M. Trofimova
Yuri Kuratov
A. Sorokin
Akses Cepat
- Tahun Terbit
- 2019
- Bahasa
- en
- Total Sitasi
- 102×
- Sumber Database
- Semantic Scholar
- DOI
- 10.18653/v1/W19-3712
- Akses
- Open Access ✓