Semantic Scholar Open Access 2025

Efficient Big Data Processing and Recommendation System Development with Apache Spark

Shanqi Zhan Yujuan Qiu

Abstrak

The rapid development of big data analytics has revolutionized data analysis and decision-making processes across industries. This paper explores how to use Apache Spark to analyze the MovieLens 20M dataset and identify the top movies in Minnesota. By integrating robust data preprocessing and collaborative filtering techniques, a novel recommendation system is developed. The results reveal the popular movies in Minnesota, major genres such as drama and comedy, and related tags such as “original” and “finale.” Additionally, a detailed tag correlation analysis is conducted to optimize recommendation accuracy. The study further illustrates Spark's application in large-scale data processing, demonstrating its effectiveness in recommendation systems. These findings bridge the gap between theoretical frameworks and practical applications, providing a replicable approach to address challenges in preprocessing, analysis, and personalized recommendations.

Penulis (2)

S

Shanqi Zhan

Y

Yujuan Qiu

Format Sitasi

Zhan, S., Qiu, Y. (2025). Efficient Big Data Processing and Recommendation System Development with Apache Spark. https://doi.org/10.1109/ISBDAS64762.2025.11116871

Akses Cepat

Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
Semantic Scholar
DOI
10.1109/ISBDAS64762.2025.11116871
Akses
Open Access ✓