Efficient Big Data Processing and Recommendation System Development with Apache Spark
Abstrak
The rapid development of big data analytics has revolutionized data analysis and decision-making processes across industries. This paper explores how to use Apache Spark to analyze the MovieLens 20M dataset and identify the top movies in Minnesota. By integrating robust data preprocessing and collaborative filtering techniques, a novel recommendation system is developed. The results reveal the popular movies in Minnesota, major genres such as drama and comedy, and related tags such as “original” and “finale.” Additionally, a detailed tag correlation analysis is conducted to optimize recommendation accuracy. The study further illustrates Spark's application in large-scale data processing, demonstrating its effectiveness in recommendation systems. These findings bridge the gap between theoretical frameworks and practical applications, providing a replicable approach to address challenges in preprocessing, analysis, and personalized recommendations.
Penulis (2)
Shanqi Zhan
Yujuan Qiu
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- Semantic Scholar
- DOI
- 10.1109/ISBDAS64762.2025.11116871
- Akses
- Open Access ✓