Accurate and interpretable prediction of chemical oxygen demand using explainable boosting algorithms with SHAP analysis
Abstrak
Abstract Accurate prediction of Chemical Oxygen Demand (COD) is vital for effective water quality management and pollution control. This study compares six ensemble boosting models, AdaBoost, CatBoost, XGBoost, LightGBM, HistGBRT, and NGBoost, for estimating COD from multiple water quality parameters, including pH, dissolved oxygen, suspended solids, and specific conductance. Data from two monitoring stations in South Korea (Toilchun and Hwangji) were used to train and validate the models. Model performance was evaluated using RMSE, MAE, R, NSE, and PBIAS, while interpretability was assessed through SHapley Additive exPlanations (SHAP). Results showed that NGBoost achieved the highest predictive accuracy at Toilchun (R = 0.979, NSE = 0.958, RMSE = 0.397 mg/L), while CatBoost performed best at Hwangji (R = 0.861, NSE = 0.733, RMSE = 0.477 mg/L). As NGBoost provides predictive probability distributions rather than single estimates, its results also reflect model uncertainty, supporting a more robust quantification of COD variability. SHAP analysis identified total organic carbon (TOC), biochemical oxygen demand (BOD₅), and suspended solids (SS) as the most influential variables controlling COD dynamics.
Penulis (9)
Khaled Merabet
Sungwon Kim
Salim Heddam
Fabio Di Nunno
Francesco Granata
Ozgur Kisi
Rana Muhammad Adnan
Mohammad Zounemat-Kermani
Christoph Külls
Akses Cepat
- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.1038/s41598-026-38757-4
- Akses
- Open Access ✓