Interpretable machine learning for predicting compression index of clays using SHAP and gradient boosting models
Abstrak
Abstract This study introduces a novel, interpretable machine learning framework for predicting the compression index (Cc) of clay soils by integrating three advanced gradient boosting algorithms—XGBoost, CatBoost, and LightGBM—with SHapley Additive exPlanations (SHAP). A comprehensive dataset of 1,243 clay samples, compiled from peer-reviewed literature, includes four geotechnical input variables: plastic limit (PL), plasticity index (PI), initial void ratio (e₀) and water content (w). Data were standardized and partitioned into training (70%) and testing (30%) subsets. Model development employed fivefold cross-validation and Optuna-based hyperparameter optimization. Among the models, XGBoost demonstrated the highest generalization capability, achieving an R2 of 0.913, RMSE of 0.197, and MAE of 0.100 on the test set. SHAP analysis revealed that initial void ratio (e₀) and water content (w) were the most influential features, with mean SHAP values of 0.20 and 0.10, respectively, aligning with established geotechnical principles. The proposed framework enhances transparency in machine learning predictions by making the model’s decision process understandable, thereby addressing the limitations of traditional “black-box” AI. It offers a reliable and efficient alternative to conventional oedometer testing, particularly beneficial for preliminary geotechnical design where timely and interpretable predictions are essential. Graphical Abstract
Topik & Kata Kunci
Penulis (5)
Khaled Hamdaoui
Ali Benzaamia
Billal Sari Ahmed
Mohamed Elhebib Guellil
Mohamed Ghrici
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1186/s44147-025-00727-4
- Akses
- Open Access ✓