Semantic Scholar Open Access 2014 481 sitasi

Video (language) modeling: a baseline for generative models of natural videos

Marc'Aurelio Ranzato Arthur Szlam Joan Bruna Michaël Mathieu R. Collobert +1 lainnya

Abstrak

We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and adapted to the vision domain by quantizing the space of image patches into a large dictionary. We demonstrate the approach on both a filling and a generation task. For the first time, we show that, after training on natural videos, such a model can predict non-trivial motions over short video sequences.

Topik & Kata Kunci

Penulis (6)

M

Marc'Aurelio Ranzato

A

Arthur Szlam

J

Joan Bruna

M

Michaël Mathieu

R

R. Collobert

S

S. Chopra

Format Sitasi

Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R., Chopra, S. (2014). Video (language) modeling: a baseline for generative models of natural videos. https://www.semanticscholar.org/paper/355f98e4827a1b6ad3f29d07ea2bcf9ad078295c

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2014
Bahasa
en
Total Sitasi
481×
Sumber Database
Semantic Scholar
Akses
Open Access ✓