Lithological mapping with pseudo-labelling: Promise or overestimation in data-scarce settings?
Abstrak
Reference data are the most crucial points in model building. In geoscience, a scarcity of sufficient reference data is common. Pseudo-labelling (PL), i.e. incorporating high-probability data in the model-building process, offers a potential solution. We aimed to reveal the efficiency of PL in lithological mapping in a vegetation-free arid region of Sudan. Multiple Adaptive Regression Splines (MARS) and Random Forest (RF) were used to classify a Landsat 9 image. Reference data were collected during fieldwork and through visual interpretation. Image processing yielded classified maps with associated probability layers, from which 1000 additional traditional samples (PL data) were extracted at a 95 percent probability. A detailed accuracy assessment was conducted, and accuracy measures were evaluated using statistical analysis and visual inspection. MARS was found to be an ambiguous classifier because the probability was too optimistic related to the overall accuracy (OA) (81% of samples had above 99% probability, OA = 98.2%) compared to RF (21% above 99%, OA = 98.1%); that is, despite the high probability, the accuracy improvement was only 0.1 percent. At the class level, the correlation between probability and the F1-score was low (0.21%). The original and PL-based models resulted in different maps with improved accuracy, although the new model version showed lower probability values for both the classifiers. Visual inspection proved essential for better insights into the spatial patterns: expert knowledge is crucial for controlling the occurrence of rock types and identifying false classifications. The main finding is that probability should be handled carefully, as it does not guarantee high model performance in classification, although the PL approach can lead to more reliable maps.
Topik & Kata Kunci
Penulis (6)
Szilárd Szabó
Abdelmajeed A. Elrasheed
Lilla Kovács
Imre J. Holb
Szilárd B. Likó
Dávid Abriha
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.15201/hungeobull.74.4.1
- Akses
- Open Access ✓