Cross-lingual Name Tagging and Linking for 282 Languages
Abstrak
The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.
Topik & Kata Kunci
Penulis (6)
Xiaoman Pan
Boliang Zhang
Jonathan May
J. Nothman
Kevin Knight
Heng Ji
Akses Cepat
- Tahun Terbit
- 2017
- Bahasa
- en
- Total Sitasi
- 562×
- Sumber Database
- Semantic Scholar
- DOI
- 10.18653/v1/P17-1178
- Akses
- Open Access ✓