Data-Driven Risk Stratification for High-Cost Care Management: An Empirical Evaluation of Generalized and Regularized Models
Abstrak
This paper examines data-driven methods for the identification of high-cost patients in European health systems by assessing predictive accuracy and interpretability in generalized and regularized statistical models. We learn binary classification problems from a big health insurance dataset to identify individuals in the upper and upper of overall healthcare spending. Three modeling paradigms-Generalized Linear Models (GLM), Generalized Additive Models (GAM), and LASSO regression-are used and contrasted in terms of predictive accuracy as well as practical interpretability. We find that GAM consistently outperforms, yielding highest F1 values and lowest log loss, in capturing nonlinear associations in health care consumption better than GLM or LASSO. Frequency of surgeries, hospitalizations, and duration of insurance coverage prove to be key determinants of high-cost status, while demographic attributes like gender exert a moderate impact. The comparisons highlight the potential of utilizing interpretable yet adaptable models to enable proactive, risk-based interventions. By presenting evidence of predictive accuracy vs. interpretability trade-offs, the paper aids more efficient high-cost care management, providing pragmatic advice to European health systems to efficiently allocate assets in light of avoidable health care spending.
Topik & Kata Kunci
Penulis (1)
Eslam Abdelhakim Seyam
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.48161/qaj.v5n4a2198
- Akses
- Open Access ✓