Hourly ozone concentration estimation and its health impact study based on ensemble machine learning: A case study of Taiyuan City
Abstrak
BackgroundOzone (O3) is a major air pollutant. The existing monitoring system has uneven distribution of sites, insufficient coverage in underdeveloped areas, and low temporal resolution, making it difficult to obtain hourly data. This limits the dynamic identification of pollution and the formulation of prevention and control strategies. ObjectiveTo construct an hourly O3 concentration estimation model based on ensemble machine learning, aiming to improve the accuracy of pollution exposure assessment and explore O3 health impacts. MethodsThis study integrated land use regression modeling with modern machine learning techniques, employing random forest and XGBoost algorithms to construct base models, and stacking integration using non-negative least squares. The ensemble model was trained and validated across China using high-resolution, multi-source geographic data (e.g., meteorologicaldata, population density, land cover types, and aerosol optical thickness). It was tested in Taiyuan City, combined with a distributed lag non-linear model to analyze the association between O3 and emergency admissions.ResultsThe constructed ensemble model performed well in predicting O3 concentration, with a higher coefficient of determination (R2) and a lower root-mean-square deviation (RMSE) compared to the single models. The R2 improved from 0.90 to 0.92, and the RMSE decreased from 11.41 to 10.62, enhancing both prediction accuracy and generalization ability. In the application to Taiyuan City, the model successfully imputed the hourly-level data for the entire year. The distributed lag non-linear model analysis revealed that the relative risk (RR) values for the 6th to 8th days following O3 exposure were 1.14 (95%CI: 1.01, 1.29), 1.16 (95%CI: 1.02, 1.31), and 1.14 (95%CI: 1.01, 1.29), respectively, which were significantly higher than 1, indicating a significant lagged association (lagged 6-8 d) between O3 and the number of emergency room visits.ConclusionA high-precision, hourly-level O3 concentration estimation model is successfully constructed by combining the land use regression model with an ensemble machine learning approach to provide a scientific basis for environmental policy formulation and public health intervention. The application of the model verifies its generalization ability and practical application value, which can provide a new technical framework for subsequent environmental health research.
Topik & Kata Kunci
Penulis (7)
Rule DU
Xiaojuan YANG
Ruixia NIU
Yang XU
Guiming ZHU
Qian GAO
Tong WANG
Akses Cepat
- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.11836/JEOM25283
- Akses
- Open Access ✓