Navigating the missing data maze: exploring multiple imputation techniques for environmental performance index data
Abstrak
The Environmental Performance Index (EPI) is widely used to assess a country’s environmental sustainability in terms of climate, environmental health, and ecosystem health. However, significant missing data can lead to biased outcomes, potentially resulting in unfair penalties or rewards for certain countries. By ensuring equal treatment of missing values, the EPI could enhance its usefulness. This study aimed to compare multiple imputation methods, specifically MICEForest, k NN, and MissForest imputation, on EPI data with missing data ranging from 1% to over 50%. The study also evaluated these methods’ ability to handle Missing Not At Random (MNAR) issues, specifically for fisheries and maritime activities indicators in landlocked countries. While it was assumed that landlocked countries could access the sea through various means, limited information was available. Our results showed that MICEForest, k NN, and MissForest imputation methods produced imputed data closely matching the original data, with minimal impact on central tendencies, as indicated by low MAE, RMSE, MAPE, and WAPE values. Sensitivity analysis revealed that MissForest and k NN were more stable and consistent than MICEForest across all error metrics when parameters were adjusted. Future research may explore deep learning techniques for handling missing data in environmental datasets like EPI.
Topik & Kata Kunci
Penulis (5)
Muhammed Haziq Muhammed Nor
Mohd Aftar Abu Bakar
Noratiqah Mohd Ariff
Wan Hanna Melini Wan Mohtar
Bernard Kok Bang Lee
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1088/2515-7620/add8e7
- Akses
- Open Access ✓