CrossRef Open Access 2026

AI and paleontology: effects of vertebrate fossil sample size on machine learning image classification

Bruce J. MacFadden Cristobal A. Barberis Maria C. Vallejo-Pareja Samantha P. Zbinden Victor J. Perez +4 lainnya

Abstrak

Abstract With the growing application of artificial intelligence (AI) and machine learning (ML), great potential exists to leverage these technologies in paleontology. Relative to many other scientific fields, a challenge of ML applied to paleontology is small sample sizes, particularly for fossil vertebrates. Shark teeth, abundant in the fossil record, provide a model system to use ML across varying sample sizes. Here we use six classes (taxa) of Neogene shark teeth for taxonomic identification, including a curated dataset of 3150 images. Each class was evaluated using an 80% training and 20% validation split, with a separate, external test set of 25 samples per class. Pretrained models perform well (accuracy > 90%), providing a strong baseline for classification. However, enabling fine-tuning of the ML model to identify fossil shark teeth improves performance considerably. Likewise, sample size per class also affects the accuracy of the models’ classifications. Smaller sample sizes ( n = 50 individuals per class) yielded a mean accuracy of 93.4%, but plateaued at ~99% between 200 and 500 images per class. Confidence likewise increases with larger samples, from 81.8% ( n = 50 individuals per class) to >90% ( n = 300 to 500 individuals per class). Misidentifications followed consistent patterns, reflecting morphological similarities and/or poor preservation. Artificially increasing the training datasets using data augmentation improves the confidence of identifications. This research indicates that relatively small samples of vertebrate species (~50 to 500 individuals per class) can effectively train an ML model to identify these shark teeth with high levels of accuracy.

Penulis (9)

B

Bruce J. MacFadden

C

Cristobal A. Barberis

M

Maria C. Vallejo-Pareja

S

Samantha P. Zbinden

V

Victor J. Perez

S

Stephanie R. Killingsworth

K

Kenneth W. Marks

D

Dévi Hall

A

Arthur Porto

Format Sitasi

MacFadden, B.J., Barberis, C.A., Vallejo-Pareja, M.C., Zbinden, S.P., Perez, V.J., Killingsworth, S.R. et al. (2026). AI and paleontology: effects of vertebrate fossil sample size on machine learning image classification. https://doi.org/10.1017/pab.2025.10084

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1017/pab.2025.10084
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
CrossRef
DOI
10.1017/pab.2025.10084
Akses
Open Access ✓