Semantic Scholar Open Access 2022 1830 sitasi

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Zhiqi Li Wenhai Wang Hongyang Li Enze Xie Chonghao Sima +3 lainnya

Lihat Sumber DOI

Abstrak

3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code is available at \url{https://github.com/zhiqi-li/BEVFormer}.

Topik & Kata Kunci

Computer Science

Penulis (8)

Zhiqi Li

Wenhai Wang

Hongyang Li

Enze Xie

Chonghao Sima

Tong Lu

Qiao Yu

Jifeng Dai

Format Sitasi

APA MLA BibTeX

Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T. et al. (2022). BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers. https://doi.org/10.48550/arXiv.2203.17270

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2203.17270

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Total Sitasi: 1830×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2203.17270
Akses: Open Access ✓