arXiv
Open Access
2017
Feature selection in high-dimensional dataset using MapReduce
Claudio Reggiani
Yann-Aël Le Borgne
Gianluca Bontempi
Abstrak
This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.
Penulis (3)
C
Claudio Reggiani
Y
Yann-Aël Le Borgne
G
Gianluca Bontempi
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2017
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓