arXiv Open Access 2021

EENLP: Cross-lingual Eastern European NLP Index

Alexey Tikhonov Alex Malkhasov Andrey Manoshin George Dima Réka Cserháti +2 lainnya

Lihat Sumber

Abstrak

Motivated by the sparsity of NLP resources for Eastern European languages, we present a broad index of existing Eastern European language resources (90+ datasets and 45+ models) published as a github repository open for updates from the community. Furthermore, to support the evaluation of commonsense reasoning tasks, we provide hand-crafted cross-lingual datasets for five different semantic tasks (namely news categorization, paraphrase detection, Natural Language Inference (NLI) task, tweet sentiment detection, and news sentiment detection) for some of the Eastern European languages. We perform several experiments with the existing multilingual models on these datasets to define the performance baselines and compare them to the existing results for other languages.

Topik & Kata Kunci

cs.CL cs.AI cs.NE

Penulis (7)

Alexey Tikhonov

Alex Malkhasov

Andrey Manoshin

George Dima

Réka Cserháti

Md. Sadek Hossain Asif

Matt Sárdi

Format Sitasi

APA MLA BibTeX

Tikhonov, A., Malkhasov, A., Manoshin, A., Dima, G., Cserháti, R., Asif, M.S.H. et al. (2021). EENLP: Cross-lingual Eastern European NLP Index. https://arxiv.org/abs/2108.02605

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓