arXiv
Open Access
2015
Where To Look: Focus Regions for Visual Question Answering
Kevin J. Shih
Saurabh Singh
Derek Hoiem
Abstrak
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.
Topik & Kata Kunci
Penulis (3)
K
Kevin J. Shih
S
Saurabh Singh
D
Derek Hoiem
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2015
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓