Semantic Scholar Open Access 2023 57 sitasi

Multi-modal Queried Object Detection in the Wild

Yifan Xu Mengdan Zhang Chaoyou Fu Peixian Chen Xiaoshan Yang +2 lainnya

Lihat Sumber DOI

Abstrak

We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8% AP on the LVIS benchmark via multi-modal queries without any downstream finetuning, and averagely +6.3% AP on 13 few-shot downstream tasks, with merely additional 3% modulating time required by GLIP. Code is available at https://github.com/YifanXu74/MQ-Det.

Topik & Kata Kunci

Computer Science

Penulis (7)

Yifan Xu

Mengdan Zhang

Chaoyou Fu

Peixian Chen

Xiaoshan Yang

Ke Li

Changsheng Xu

Format Sitasi

APA MLA BibTeX

Xu, Y., Zhang, M., Fu, C., Chen, P., Yang, X., Li, K. et al. (2023). Multi-modal Queried Object Detection in the Wild. https://doi.org/10.48550/arXiv.2305.18980

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2305.18980

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 57×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2305.18980
Akses: Open Access ✓