Semantic Scholar Open Access 2022 27 sitasi

Albanian Fake News Detection

Ercan Canhasi Rexhep Shijaku Erblin Berisha

Abstrak

Recent years have witnessed the vast increase of the phenomenon known as the fake news. Among the main reasons for this increase are the continuous growth of internet and social media usage and the real-time information dissemination opportunity offered by them. Deceiving, misleading content, such as the fake news, especially the type made by and for social media users, is becoming eminently hazardous. Hence, the fake news detection problem has become an important research topic. Despite the recent advances in fake news detection, the lack of fake news corpora for the under-resourced languages is compromising the development and the evaluation of existing approaches in these languages. To fill this huge gap, in this article, we investigate the issue of fake news detection for the Albanian language. In it, we present a new public dataset of labeled true and fake news in Albanian and perform an extensive analysis of machine learning methods for fake news detection. We performed a comprehensive feature engineering and feature selection experiments. In doing so, we explored the Albanian language-related feature categories such as the lexical, syntactic, lying-detection, and psycho-linguistic features. Each article was also modeled in four different ways: with the traditional bag-of-words (BoW) and with three distributed text representations using the state-of-the-art Word2Vec, FastText, and BERT methods. Additionally, we investigated the best combination of features and various types of classification methods. The conducted experiments and obtained results from evaluations are finally used to draw some conclusions. They shed light on the potentiality of the methods and the challenges that the Albanian fake news detection presents.

Topik & Kata Kunci

Penulis (3)

E

Ercan Canhasi

R

Rexhep Shijaku

E

Erblin Berisha

Format Sitasi

Canhasi, E., Shijaku, R., Berisha, E. (2022). Albanian Fake News Detection. https://doi.org/10.1145/3487288

Akses Cepat

Lihat di Sumber doi.org/10.1145/3487288
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Total Sitasi
27×
Sumber Database
Semantic Scholar
DOI
10.1145/3487288
Akses
Open Access ✓