Semantic Scholar Open Access 2023 7509 sitasi

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab Timothée Darcet Théo Moutakanni Huy V. Vo Marc Szafraniec +21 lainnya

Lihat Sumber DOI

Abstrak

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.

Topik & Kata Kunci

Computer Science

Penulis (26)

M. Oquab

Timothée Darcet

Théo Moutakanni

Huy V. Vo

Marc Szafraniec

Vasil Khalidov

Pierre Fernandez

Daniel Haziza

Francisco Massa

Alaaeldin El-Nouby

Mahmoud Assran

Nicolas Ballas

Wojciech Galuba

Russ Howes

Po-Yao (Bernie) Huang

Shang-Wen Li

Ishan Misra

Michael G. Rabbat

Vasu Sharma

Gabriel Synnaeve

Hu Xu

H. Jégou

J. Mairal

Patrick Labatut

Armand Joulin

Piotr Bojanowski

Format Sitasi

APA MLA BibTeX

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V. et al. (2023). DINOv2: Learning Robust Visual Features without Supervision. https://doi.org/10.48550/arXiv.2304.07193

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2304.07193

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 7509×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2304.07193
Akses: Open Access ✓