DOAJ Open Access 2025

Machine learning-based models for screening of anemia and leukemia using features of complete blood count reports

Hafsa Amjad Zamir Hussain Mahnoor Hasan Mahmood Ul Hassan

Abstrak

Abstract Complete blood count (CBC) report features are routinely used to screen a wide array of hematological disorders. However, the complexity of disease overlap increases the probability of neglecting the underlying patterns between these features, and the heterogeneity associated with the subjective assessment of CBC reports often lead to random clinical testing. Such disease prediction analyses can be enhanced by the incorporation of machine learning (ML) algorithms for efficient handling of CBC features. Hybrid synthetic data are generated based on the statistical distribution of features to overcome the constraint of small sample size (N = 287). To the extent of our knowledge, our study is the first to employ hybrid synthetic data for modeling hematological parameters. Six ML models i.e., decision tree, random forest, support vector machine, logistic regression, gradient boosting machine, and multilayer perceptron are tested for disease prediction. This research presents ML-based models for the screening of two common blood disorders – anemia and leukemia, using CBC report features. A ‘fingerprint’ of 14 out of 21 features based on both statistical and clinical relevance is selected for model development. Exceptional performance has been observed by the random forest algorithm with 98% accuracy and 97, 98, 99, and 2% macro-averages of precision, recall, specificity, and miss-rate respectively for all classes. However, external validation of the model reveal poor generalizability on a different demographic dataset, as the model obtained an accuracy of 74%. The proposed methodology may serve as an efficient support system for the screening of anemia and leukemia. However, extensive optimization with regards to its generalizability are warranted.

Topik & Kata Kunci

Penulis (4)

H

Hafsa Amjad

Z

Zamir Hussain

M

Mahnoor Hasan

M

Mahmood Ul Hassan

Format Sitasi

Amjad, H., Hussain, Z., Hasan, M., Hassan, M.U. (2025). Machine learning-based models for screening of anemia and leukemia using features of complete blood count reports. https://doi.org/10.1038/s41598-025-21279-w

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1038/s41598-025-21279-w
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.1038/s41598-025-21279-w
Akses
Open Access ✓