arXiv Open Access 2024

Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset

Abbi Ward Jimmy Li Julie Wang Sriram Lakshminarasimhan Ashley Carrick +15 lainnya
Lihat Sumber

Abstrak

Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and artificial intelligence (AI) tool development. Dermatology is a suitable area to develop and test a new and scalable method to create representative health datasets. Methods: We used Google Search advertisements to invite contributions to an open access dataset of images of dermatology conditions, demographic and symptom information. With informed contributor consent, we describe and release this dataset containing 10,408 images from 5,033 contributions from internet users in the United States over 8 months starting March 2023. The dataset includes dermatologist condition labels as well as estimated Fitzpatrick Skin Type (eFST) and Monk Skin Tone (eMST) labels for the images. Results: We received a median of 22 submissions/day (IQR 14-30). Female (66.72%) and younger (52% < age 40) contributors had a higher representation in the dataset compared to the US population, and 32.6% of contributors reported a non-White racial or ethnic identity. Over 97.5% of contributions were genuine images of skin conditions. Dermatologist confidence in assigning a differential diagnosis increased with the number of available variables, and showed a weaker correlation with image sharpness (Spearman's P values <0.001 and 0.01 respectively). Most contributions were short-duration (54% with onset < 7 days ago ) and 89% were allergic, infectious, or inflammatory conditions. eFST and eMST distributions reflected the geographical origin of the dataset. The dataset is available at github.com/google-research-datasets/scin . Conclusion: Search ads are effective at crowdsourcing images of health conditions. The SCIN dataset bridges important gaps in the availability of representative images of common skin conditions.

Topik & Kata Kunci

Penulis (20)

A

Abbi Ward

J

Jimmy Li

J

Julie Wang

S

Sriram Lakshminarasimhan

A

Ashley Carrick

B

Bilson Campana

J

Jay Hartford

P

Pradeep Kumar S

T

Tiya Tiyasirichokchai

S

Sunny Virmani

R

Renee Wong

Y

Yossi Matias

G

Greg S. Corrado

D

Dale R. Webster

D

Dawn Siegel

S

Steven Lin

J

Justin Ko

A

Alan Karthikesalingam

C

Christopher Semturs

P

Pooja Rao

Format Sitasi

Ward, A., Li, J., Wang, J., Lakshminarasimhan, S., Carrick, A., Campana, B. et al. (2024). Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset. https://arxiv.org/abs/2402.18545

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓