DOAJ Open Access 2025

Estimating 10-Year Cardiovascular Disease Risk in Primary Prevention Using UK Electronic Health Records and a Hybrid Multitask BERT Model: Retrospective Cohort Study

Tianyi Liu Lei Lu Yanzhong Wang Andrew J Krentz Vasa Curcin

Abstrak

Abstract BackgroundCardiovascular disease (CVD) remains a leading cause of preventable morbidity and mortality, highlighting the need for early risk stratification in primary prevention. Traditional Cox models assume proportional hazards and linear effects, limiting flexibility. While machine learning offers greater expressiveness, many models rely solely on structured data and overlook time-to-event (TTE) information. Integrating structured and textual representations may enhance prediction and support equitable assessment across clinical subgroups. ObjectiveThis study aims to develop a hybrid multitask deep learning model (MT-BERT [multitask Bidirectional Encoder Representations from Transformers]) integrating structured and textual features from electronic health records (EHRs) to predict 10-year CVD risk, enhancing individualized stratification and supporting equitable assessment across diverse demographic groups. MethodsWe used data from Clinical Practice Research Datalink (CPRD) Aurum comprising 469,496 patients aged 40‐85 years to develop MT-BERT for 10-year CVD risk prediction. Structured EHR variables and their corresponding textual representations were jointly encoded using a multilayer perceptron and a distilled version of the BERT model (DistilBERT), respectively. A fusion layer and stacked multihead attention modules enabled cross-modal interaction modeling. The model generated both binary classification outputs and TTE risk scores, optimized using a custom FocalCoxLoss function with uncertainty-based weighting. Prediction targets encompassed composite and individual CVD outcomes. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), concordance index, and Brier score, with subgroup analyses by ethnicity and deprivation, and heterogeneity assessed using Higgins I ResultsThe MT-BERT model yielded AUROC values of 0.744 (95% CI 0.738‐0.749) in males and 0.782 (95% CI 0.768‐0.796) in females on the test set (n=711,052), and 0.736 (95% CI 0.729‐0.741) and 0.775 (95% CI 0.768‐0.780), respectively in “spatial external” validation (n=144,370). Brier scores were 0.130 in males and 0.091 in females. Individuals classified as high-risk (≥40% risk in males and ≥34% in females) demonstrated significantly reduced 10-year event-free survival relative to lower-risk individuals (log-rank PI ConclusionsThe proposed hybrid MT-BERT model predicts 10-year CVD risk for primary prevention by integrating structured variables and unstructured clinical text from EHRs. Its multitask design facilitates both individualized risk stratification and TTE estimation. While performance was modestly reduced in deprived and minority ethnic subgroups, these findings provide preliminary support for advancing equity-aware, data-driven prevention strategies in increasingly diverse health care settings.

Penulis (5)

T

Tianyi Liu

L

Lei Lu

Y

Yanzhong Wang

A

Andrew J Krentz

V

Vasa Curcin

Format Sitasi

Liu, T., Lu, L., Wang, Y., Krentz, A.J., Curcin, V. (2025). Estimating 10-Year Cardiovascular Disease Risk in Primary Prevention Using UK Electronic Health Records and a Hybrid Multitask BERT Model: Retrospective Cohort Study. https://doi.org/10.2196/76659

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.2196/76659
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.2196/76659
Akses
Open Access ✓