Semantic Scholar Open Access 2014 481 sitasi

Video (language) modeling: a baseline for generative models of natural videos

Marc'Aurelio Ranzato Arthur Szlam Joan Bruna Michaël Mathieu R. Collobert +1 lainnya

Lihat Sumber

Abstrak

We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and adapted to the vision domain by quantizing the space of image patches into a large dictionary. We demonstrate the approach on both a filling and a generation task. For the first time, we show that, after training on natural videos, such a model can predict non-trivial motions over short video sequences.

Topik & Kata Kunci

Computer Science

Penulis (6)

Marc'Aurelio Ranzato

Arthur Szlam

Joan Bruna

Michaël Mathieu

R. Collobert

S. Chopra

Format Sitasi

APA MLA BibTeX

Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R., Chopra, S. (2014). Video (language) modeling: a baseline for generative models of natural videos. https://www.semanticscholar.org/paper/355f98e4827a1b6ad3f29d07ea2bcf9ad078295c

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2014
Bahasa: en
Total Sitasi: 481×
Sumber Database: Semantic Scholar
Akses: Open Access ✓