Semantic Scholar Open Access 2025

Efficient Big Data Processing and Recommendation System Development with Apache Spark

Shanqi Zhan Yujuan Qiu

Abstrak

The rapid development of big data analytics has revolutionized data analysis and decision-making processes across industries. This paper explores how to use Apache Spark to analyze the MovieLens 20M dataset and identify the top movies in Minnesota. By integrating robust data preprocessing and collaborative filtering techniques, a novel recommendation system is developed. The results reveal the popular movies in Minnesota, major genres such as drama and comedy, and related tags such as “original” and “finale.” Additionally, a detailed tag correlation analysis is conducted to optimize recommendation accuracy. The study further illustrates Spark's application in large-scale data processing, demonstrating its effectiveness in recommendation systems. These findings bridge the gap between theoretical frameworks and practical applications, providing a replicable approach to address challenges in preprocessing, analysis, and personalized recommendations.

Penulis (2)

Shanqi Zhan

Yujuan Qiu

Format Sitasi

APA MLA BibTeX

Zhan, S., Qiu, Y. (2025). Efficient Big Data Processing and Recommendation System Development with Apache Spark. https://doi.org/10.1109/ISBDAS64762.2025.11116871

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1109/ISBDAS64762.2025.11116871

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: Semantic Scholar
DOI: 10.1109/ISBDAS64762.2025.11116871
Akses: Open Access ✓