Semantic Scholar Open Access 2015 544 sitasi

Jointly Modeling Embedding and Translation to Bridge Video and Language

Yingwei Pan Tao Mei Ting Yao Houqiang Li Y. Rui

Abstrak

Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.

Topik & Kata Kunci

Penulis (5)

Y

Yingwei Pan

T

Tao Mei

T

Ting Yao

H

Houqiang Li

Y

Y. Rui

Format Sitasi

Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y. (2015). Jointly Modeling Embedding and Translation to Bridge Video and Language. https://doi.org/10.1109/CVPR.2016.497

Akses Cepat

Lihat di Sumber doi.org/10.1109/CVPR.2016.497
Informasi Jurnal
Tahun Terbit
2015
Bahasa
en
Total Sitasi
544×
Sumber Database
Semantic Scholar
DOI
10.1109/CVPR.2016.497
Akses
Open Access ✓