Semantic Scholar Open Access 2021 3679 sitasi

Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng Ishan Misra A. Schwing Alexander Kirillov Rohit Girdhar

Lihat Sumber DOI

Abstrak

Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing spe-cialized architectures for each task. We present Masked- attention Mask Transformer (Mask2Former), a new archi-tecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components in-clude masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most no-tably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU onADE20K).

Topik & Kata Kunci

Computer Science

Penulis (5)

Bowen Cheng

Ishan Misra

A. Schwing

Alexander Kirillov

Rohit Girdhar

Format Sitasi

APA MLA BibTeX

Cheng, B., Misra, I., Schwing, A., Kirillov, A., Girdhar, R. (2021). Masked-attention Mask Transformer for Universal Image Segmentation. https://doi.org/10.1109/CVPR52688.2022.00135

Akses Cepat

Lihat di Sumber doi.org/10.1109/CVPR52688.2022.00135

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Total Sitasi: 3679×
Sumber Database: Semantic Scholar
DOI: 10.1109/CVPR52688.2022.00135
Akses: Open Access ✓