DOAJ Open Access 2024

Enhancing Word Sense Disambiguation for Amharic homophone words using Bidirectional Long Short-Term Memory network

Mequanent Degu Belete Lijalem Getanew Shiferaw Girma Kassa Alitasb Tariku Sinshaw Tamir

Abstrak

Given the Amharic language has a lot of perplexing terminology since it features duplicate homophone letters, fidel's ሀ, ሐ, and ኀ (three of which are pronounced as HA), ሠ and ሰ (both pronounced as SE), አ and ዐ (both pronounced as AE), and ጸ and ፀ (both pronounced as TSE). The WSD (Word Sense Disambiguation) model, which tackles the issue of lexical ambiguity in the context of the Amharic language, is developed using a deep learning technique. Due to the unavailability of the Amharic wordnet, a total of 1756 examples of paired Amharic ambiguous homophonic words were collected. These words were ድህነት(dhnet) and ድኅነት(dhnet), ምሁር(m'hur) and ምሑር(m'hur), በአል(be'al) and በዢል(be'al), አቢይ (abiy) and ዐቢይ(abiy), with a total of 1756 examples. Following word preprocessing, word2vec, fasttext, Term Frequency-Inverse Document Frequency (TFIDF), and bag of words (BoW) were used to vectorize the text. The vectorized text was divided into train and test data. The train data was then analysed using Naive Bayes (NB), K-nearest neighbour (KNN), logistic regression (LG), decision trees (DT), random forests (RF), and random oversampling technique. Bidirectional Gate Recurrent Unit (BiGRU) and Bidirectional Long Short-Term Memory (BiLSTM) improved to 99.99 % accuracy even with limited datasets.

Penulis (4)

M

Mequanent Degu Belete

L

Lijalem Getanew Shiferaw

G

Girma Kassa Alitasb

T

Tariku Sinshaw Tamir

Format Sitasi

Belete, M.D., Shiferaw, L.G., Alitasb, G.K., Tamir, T.S. (2024). Enhancing Word Sense Disambiguation for Amharic homophone words using Bidirectional Long Short-Term Memory network. https://doi.org/10.1016/j.iswa.2024.200417

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1016/j.iswa.2024.200417
Informasi Jurnal
Tahun Terbit
2024
Sumber Database
DOAJ
DOI
10.1016/j.iswa.2024.200417
Akses
Open Access ✓