Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction
Abstrak
Early and reliable detection of skin cancer is critical for improving patient outcomes and minimizing diagnostic uncertainty in dermatological practice. This study proposes an interpretable hybrid framework that integrates ConvMixer-based deep feature extraction with gradient boosting classifiers to perform multi-class skin lesion classification on the publicly available PAD-UFES-20 dataset. The dataset contains 2298 dermoscopic and clinical images with associated patient metadata (age, gender, and anatomical site), enabling a joint evaluation of demographic and anatomical factors influencing model performance. After data augmentation, normalization, and class balancing using Borderline-SMOTE, Image embeddings extracted via ConvMixer were integrated with patient metadata and subsequently classified using CatBoost, XGBoost, and LightGBM. Among these, CatBoost achieved the highest macro-AUC of 0.94 and macro-F1 of 0.88, with a melanoma sensitivity of 0.91, while maintaining good calibration (Brier score = 0.06). Grad-CAM and SHAP analyses confirmed that the model’s attention and feature importance correspond to clinically relevant lesion regions and attributes. The results highlight that age and body-region imbalances in the PAD-UFES-20 dataset modestly influence predictive behavior, emphasizing the importance of balanced sampling and stratified validation. Overall, the proposed ConvMixer–CatBoost framework provides a compact, explainable, and generalizable solution for AI-assisted skin cancer classification.
Topik & Kata Kunci
Penulis (5)
Danish Javed
Usama Arshad
Haider Irfan
Raja Hashim Ali
Talha Ali Khan
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.3390/engproc2025087115
- Akses
- Open Access ✓