Detecting Fake News in Urdu Language Using Machine Learning, Deep Learning, and Large Language Model-Based Approaches
Abstrak
Fake news is false or misleading information that looks like real news and spreads through traditional and social media. It has a big impact on our social lives, especially in politics. In Pakistan, where Urdu is the main language, finding fake news in Urdu is difficult because there are not many effective systems for this. This study aims to solve this problem by creating a detailed process and training models using machine learning, deep learning, and large language models (LLMs). The research uses methods that look at the features of documents and classes to detect fake news in Urdu. Different models were tested, including machine learning models like Naïve Bayes and Support Vector Machine (SVM), as well as deep learning models like Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), which used embedding techniques. The study also used advanced models like BERT and GPT to improve the detection process. These models were first evaluated on the Bend-the-Truth dataset, where CNN achieved an F1 score of 72%, Naïve Bayes scored 78%, and the BERT Transformer achieved the highest F1 score of 79% on Bend the Truth dataset. To further validate the approach, the models were tested on a more diverse dataset, Ax-to-Grind, where both SVM and LSTM achieved an F1 score of 89%, while BERT outperformed them with an F1 score of 93%.
Topik & Kata Kunci
Penulis (4)
Muhammad Shoaib Farooq
Syed Muhammad Asadullah Gilani
Muhammad Faraz Manzoor
Momina Shaheen
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.3390/info16070595
- Akses
- Open Access ✓