GSMT: An explainable semi-supervised multi-label method based on Gower distance
Abstrak
The financial, health, and education sectors produce vast amounts of data daily. Labeling entries such as assets, patients, and students is both costly and complex due to the evolution of databases into multi-label settings. Handling real-world data requires automatic labeling to circumvent slow manual procedures and explanations for compliance with regulations. In this work, we introduce GSMT, an inductive Explainable Semi-Supervised Multi-Label Random Forest Method based on Gower Distance, which uses supervised and unsupervised data to provide a non-linear solution for mainly tabular multi-label datasets with fully unknown label vectors. GSMT splits the dataset using multi-dimensional manifolds, completes missing label information and inductively predicts new observations while achieving explainability. We demonstrate state-of-the-art performance across Micro F1 Score, AUPRC, AUROC, and Label Rank Average Precision in a study involving 20 numerical and 5 mostly categorical datasets with five missing data ratios. By leveraging unsupervised information on top of numerical and categorical data, GSMT outputs the pattern rules annotated with performance measures, explanations on attribute and label space as well as an inductive model capable of predicting multi-label observations.
Topik & Kata Kunci
Penulis (3)
José Carlos Mondragón
Andres Eduardo Gutierrez-Rodríguez
Victor Adrián Sosa Hernández
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1016/j.array.2025.100596
- Akses
- Open Access ✓