Semantic Scholar Open Access 2023 1 sitasi

Unlocking the Potential: The Crucial Role of Data Preprocessing in Big Data Analytics

Praveen Kantha V. Sinha Durgesh Srivastava Basant Sah

Abstrak

Access to the internet can significantly enhance the capabilities and opportunities in the field of data mining. It provides a vast source of data, tools, and resources that can be leveraged to improve the data mining process. The Internet offers access to a wide range of data sources, including social media, websites, online databases, and more. The effectiveness of the data mining process depends on the ability to extract from a large dataset meaningful patterns and models. The goal of data mining is to uncover previously unknown information inside large databases. However, the information in the current datasets is not always unified and clean. Despite extensive work on the part of developers and fine-tuners, data mining models remain highly dependent on the quality of the data they are fed. The focus of this research is on the steps taken before feeding data into a machine-learning system. Any machine learning algorithm's major success is predicated on the caliber of the input data it uses. Even though many aspects influence how well Machine Learning (ML) performs a job, the representation and quality of the instance data remain key components in the algorithm's overall effectiveness. The process of knowledge discovery becomes increasingly challenging during the training phase when there is an abundance of irrelevant and duplicated information, along with noisy and unreliable data. Data preparation and filtering steps in ML problems are well known to cost a substantial amount of processing time. Data preprocessing produces the final training set. Access to data, secure data handling, a robust network infrastructure, and the support of the IT industry are all critical components of successful data mining endeavors. Hence, this article offers strategies for optimizing data collection performance at every stage of data preprocessing.

Penulis (4)

P

Praveen Kantha

V

V. Sinha

D

Durgesh Srivastava

B

Basant Sah

Format Sitasi

Kantha, P., Sinha, V., Srivastava, D., Sah, B. (2023). Unlocking the Potential: The Crucial Role of Data Preprocessing in Big Data Analytics. https://doi.org/10.1109/IDICAIEI58380.2023.10406577

Akses Cepat

Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.1109/IDICAIEI58380.2023.10406577
Akses
Open Access ✓