BioCLIP: A Vision Foundation Model for the Tree of Life
Abstrak
Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an ex-plosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general or-ganismal biology questions on images is of timely need. To approach this, we curate and release Tree Of Life-10m, the largest and most diverse ML-ready dataset of biology images. We then develop Bioclip, a foundation model for the tree of life, leveraging the unique properties of bi-ology captured by Treeoflife-10m, namely the abun-dance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on di-verse fine-grained biology classification tasks and find that BloCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation reveals that BloCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability.11imageomics.github.io/bioclip has models, data and code.
Topik & Kata Kunci
Penulis (12)
Samuel Stevens
Jiaman Wu
Matthew J. Thompson
Elizabeth G. Campolongo
Chan Hee Song
David E. Carlyn
Li Dong
W. Dahdul
Charles V. Stewart
Tanya Y. Berger-Wolf
Wei-Lun Chao
Yu Su
Akses Cepat
- Tahun Terbit
- 2023
- Bahasa
- en
- Total Sitasi
- 205×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1109/CVPR52733.2024.01836
- Akses
- Open Access ✓