arXiv Open Access 2023

GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

Vikram V. Ramaswamy Sing Yu Lin Dora Zhao Aaron B. Adcock Laurens van der Maaten +2 lainnya
Lihat Sumber

Abstrak

Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, with no personally identifiable information, collected by soliciting images from people around the world. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. We demonstrate its use as both an evaluation and training dataset, allowing us to highlight and begin to mitigate the shortcomings in current models, despite GeoDE's relatively small size. We release the full dataset and code at https://geodiverse-data-collection.cs.princeton.edu

Topik & Kata Kunci

Penulis (7)

V

Vikram V. Ramaswamy

S

Sing Yu Lin

D

Dora Zhao

A

Aaron B. Adcock

L

Laurens van der Maaten

D

Deepti Ghadiyaram

O

Olga Russakovsky

Format Sitasi

Ramaswamy, V.V., Lin, S.Y., Zhao, D., Adcock, A.B., Maaten, L.v.d., Ghadiyaram, D. et al. (2023). GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition. https://arxiv.org/abs/2301.02560

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓