Semantic Scholar Open Access 2023 57 sitasi

Multi-modal Queried Object Detection in the Wild

Yifan Xu Mengdan Zhang Chaoyou Fu Peixian Chen Xiaoshan Yang +2 lainnya

Abstrak

We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8% AP on the LVIS benchmark via multi-modal queries without any downstream finetuning, and averagely +6.3% AP on 13 few-shot downstream tasks, with merely additional 3% modulating time required by GLIP. Code is available at https://github.com/YifanXu74/MQ-Det.

Topik & Kata Kunci

Penulis (7)

Y

Yifan Xu

M

Mengdan Zhang

C

Chaoyou Fu

P

Peixian Chen

X

Xiaoshan Yang

K

Ke Li

C

Changsheng Xu

Format Sitasi

Xu, Y., Zhang, M., Fu, C., Chen, P., Yang, X., Li, K. et al. (2023). Multi-modal Queried Object Detection in the Wild. https://doi.org/10.48550/arXiv.2305.18980

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2305.18980
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
57×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2305.18980
Akses
Open Access ✓