DOAJ Open Access 2025

Feature Selection and Class Imbalance Machine Learning for Early Detection of Thyroid Cancer Recurrence: A Performance-Based Analysis

Agus Wantoro Wahyu Caesarendra Admi Syarif Hari Soetanto

Abstrak

Early detection of thyroid cancer recurrence is a crucial factor in patient survival and treatment effectiveness. Misdetection results in disease severity, high cost, recovery time, and decreased service quality. In addition, the main challenges in developing a Machine Learning (ML)-based detection decision support system are class imbalance in medical data and high feature dimensions that can affect model accuracy and efficiency. This study proposes a feature selection-based approach and class imbalance handling to improve the performance of early detection of Thyroid cancer. Several feature selection techniques, such as Information Gain (IG), Gain Ratio (GR), Gini Decrease (GD), and Chi-Square (CS), can select features based on weighted ranking. In addition, to overcome the imbalanced class distribution, we use the Synthetic Minority Over-Sampling Technique (SMOTE). ML classification models such as k-NN, Tree, SVM, Naive Bayes, AdaBoost, Neural Network (NN), and Logistic Regression (LR) are tested and evaluated based on a confusion matrix, including accuracy, precision, recall, time, and log loss. Experimental results show that the combination of imbalanced class handling strategies significantly improves the prediction performance of ML algorithms. In addition, we found that the combination of CS+NN feature selection techniques consistently showed optimal performance. This study emphasizes the importance of data pre-processing and proper algorithm selection in the development of a machine learning-based thyroid cancer detection system.

Penulis (4)

A

Agus Wantoro

W

Wahyu Caesarendra

A

Admi Syarif

H

Hari Soetanto

Format Sitasi

Wantoro, A., Caesarendra, W., Syarif, A., Soetanto, H. (2025). Feature Selection and Class Imbalance Machine Learning for Early Detection of Thyroid Cancer Recurrence: A Performance-Based Analysis. https://doi.org/10.55981/jet.758

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.55981/jet.758
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.55981/jet.758
Akses
Open Access ✓