MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
Abstrak
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.
Topik & Kata Kunci
Penulis (44)
Cheikh M. Bamba Dione
David Adelani
Peter Nabende
Jesujoba Alabi
Thapelo Sindane
Happy Buzaaba
Shamsuddeen Hassan Muhammad
Chris Chinenye Emezue
Perez Ogayo
Anuoluwapo Aremu
Catherine Gitau
Derguene Mbaye
Jonathan Mukiibi
Blessing Sibanda
Bonaventure F. P. Dossou
Andiswa Bukula
Rooweither Mabuya
Allahsera Auguste Tapo
Edwin Munkoh-Buabeng
victoire Memdjokam Koagne
Fatoumata Ouoba Kabore
Amelia Taylor
Godson Kalipe
Tebogo Macucwa
Vukosi Marivate
Tajuddeen Gwadabe
Mboning Tchiaze Elvis
Ikechukwu Onyenwe
Gratien Atindogbe
Tolulope Adelani
Idris Akinade
Olanrewaju Samuel
Marien Nahimana
Théogène Musabeyezu
Emile Niyomutabazi
Ester Chimhenga
Kudzai Gotosa
Patrick Mizha
Apelete Agbolo
Seydou Traore
Chinedu Uchechukwu
Aliyu Yusuf
Muhammad Abdullahi
Dietrich Klakow
Akses Cepat
- Tahun Terbit
- 2023
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓