Semantic Scholar Open Access 2024 31 sitasi

Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review

Zaenal Abidin Akmal Junaidi Wamiliana

Abstrak

Background: Stemming is significantly essential in natural language processing (NLP) due to the ability to minimize word variations to fundamental forms. This procedure facilitates the analysis of textual data and enhances the precision of classification and information retrieval. Objective: Previous related systematic literature review has not been conducted on stemming and lemmatization in regional languages in Indonesia. Therefore, this study aims to conduct a systematic literature review to capture the latest developments in stemming and lemmatization in regional languages in Indonesia. Methods: This study was carried out using Kitchenham method, analyzing 35 studies extracted from 740, which were obtained from Scopus, IEEE Xplore, and Google Scholar, and published between 2014 and 2023. Results: The results showed that study trends in stemming possessed the potential to continue developing every year. Additionally, the main element in stemming and lemmatization studies was found to be the availability of digital dictionaries in regional languages. This was because greater number of basic vocabularies contributed more positively to stemming or lemmatization. The availability of word morphology information in regional languages would be constructive for making rule-based stemmers. Meanwhile, corpus-based stemming and lemmatization studies could only be conducted for languages with a large corpus to ensure there were various affixed words to process. Conclusion: Based on SLR study, stemming and lemmatization in regional languages in Indonesia developed significantly from 2014 to 2023. The two main strategies applied included using available digital dictionaries and language morphology information. However, the main challenges encountered were the limited number of vocabulary words in the dictionaries and testing various rule-based methods.   Keywords: Lemmatization, Morphology, Rule-based, Stemming, Systematic Literature Review.

Penulis (3)

Z

Zaenal Abidin

A

Akmal Junaidi

W

Wamiliana

Format Sitasi

Abidin, Z., Junaidi, A., Wamiliana (2024). Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review. https://doi.org/10.20473/jisebi.10.2.217-231

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.20473/jisebi.10.2.217-231
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Total Sitasi
31×
Sumber Database
Semantic Scholar
DOI
10.20473/jisebi.10.2.217-231
Akses
Open Access ✓