Semantic Scholar Open Access 2019 680 sitasi

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

Xin Eric Wang Jiawei Wu Junkun Chen Lei Li Yuan-fang Wang +1 lainnya

Lihat Sumber DOI

Abstrak

We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSR-VTT dataset, \vatex is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on \vatex: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context. Extensive experiments on the \vatex dataset show that, first, the unified multilingual model can not only produce both English and Chinese descriptions for a video more efficiently, but also offer improved performance over the monolingual models. Furthermore, we demonstrate that the spatiotemporal video context can be effectively utilized to align source and target languages and thus assist machine translation. In the end, we discuss the potentials of using \vatex for other video-and-language research.

Topik & Kata Kunci

Computer Science

Penulis (6)

Xin Eric Wang

Jiawei Wu

Junkun Chen

Lei Li

Yuan-fang Wang

William Yang Wang

Format Sitasi

APA MLA BibTeX

Wang, X.E., Wu, J., Chen, J., Li, L., Wang, Y., Wang, W.Y. (2019). VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. https://doi.org/10.1109/ICCV.2019.00468

Akses Cepat

Lihat di Sumber doi.org/10.1109/ICCV.2019.00468

Informasi Jurnal

Tahun Terbit: 2019
Bahasa: en
Total Sitasi: 680×
Sumber Database: Semantic Scholar
DOI: 10.1109/ICCV.2019.00468
Akses: Open Access ✓