Semantic Scholar Open Access 2013 491 sitasi

Polyglot: Distributed Word Representations for Multilingual NLP

Rami Al-Rfou Bryan Perozzi S. Skiena

Abstrak

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.

Topik & Kata Kunci

Penulis (3)

R

Rami Al-Rfou

B

Bryan Perozzi

S

S. Skiena

Format Sitasi

Al-Rfou, R., Perozzi, B., Skiena, S. (2013). Polyglot: Distributed Word Representations for Multilingual NLP. https://www.semanticscholar.org/paper/8e3f0f7a761f18cb91c11764d8d6cb3b1e9c5731

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2013
Bahasa
en
Total Sitasi
491×
Sumber Database
Semantic Scholar
Akses
Open Access ✓