Semantic Scholar Open Access 2016 1077 sitasi

Visual Dialog

Abhishek Das Satwik Kottur Khushi Gupta Avi Singh Deshraj Yadav +3 lainnya

Abstrak

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial contains 1 dialog (10 question-answer pairs) on ~140k images from the COCO dataset, with a total of ~1.4M dialog question-answer pairs. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders (Late Fusion, Hierarchical Recurrent Encoder and Memory Network) and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Our dataset, code, and trained models will be released publicly at https://visualdialog.org. Putting it all together, we demonstrate the first visual chatbot!.

Topik & Kata Kunci

Penulis (8)

A

Abhishek Das

S

Satwik Kottur

K

Khushi Gupta

A

Avi Singh

D

Deshraj Yadav

J

José M. F. Moura

D

Devi Parikh

D

Dhruv Batra

Format Sitasi

Das, A., Kottur, S., Gupta, K., Singh, A., Yadav, D., Moura, J.M.F. et al. (2016). Visual Dialog. https://doi.org/10.1109/CVPR.2017.121

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1109/CVPR.2017.121
Informasi Jurnal
Tahun Terbit
2016
Bahasa
en
Total Sitasi
1077×
Sumber Database
Semantic Scholar
DOI
10.1109/CVPR.2017.121
Akses
Open Access ✓