DOAJ Open Access 2023

A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles

Durga Prasad Bavirisetti Herman Ryen Martinsen Gabriel Hanssen Kiss Frank Lindseth

Abstrak

In this paper, we investigate the use of Vision Transformers for processing and understanding visual data in an autonomous driving setting. Specifically, we explore the use of Vision Transformers for semantic segmentation and monocular depth estimation using only a single image as input. We present state-of-the-art Vision Transformers for these tasks and combine them into a multitask model. Through multiple experiments on four different street image datasets, we demonstrate that the multitask approach significantly reduces inference time while maintaining high accuracy for both tasks. Additionally, we show that changing the size of the Transformer-based backbone can be used as a trade-off between inference speed and accuracy. Furthermore, we investigate the use of synthetic data for pre-training and show that it effectively increases the accuracy of the model when real-world data is limited.

Topik & Kata Kunci

Transportation engineering Transportation and communications

Penulis (4)

Durga Prasad Bavirisetti

Herman Ryen Martinsen

Gabriel Hanssen Kiss

Frank Lindseth

Format Sitasi

APA MLA BibTeX

Bavirisetti, D.P., Martinsen, H.R., Kiss, G.H., Lindseth, F. (2023). A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles. https://doi.org/10.1109/OJITS.2023.3335648

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1109/OJITS.2023.3335648

Informasi Jurnal

Tahun Terbit: 2023
Sumber Database: DOAJ
DOI: 10.1109/OJITS.2023.3335648
Akses: Open Access ✓