Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding
Abstrak
Autoregressive video generation relies on history context for content consistency and storytelling. As video histories grow longer, efficiently encoding them remains an open problem - particularly for personal users and local workflows where compute and memory budgets are limited. We present a lightweight history encoder that maps long video histories into short-length embeddings, pretrained with a frame query objective that learns to attend to content features at arbitrary temporal positions. The pretraining stage provides the encoder with dense history coverage on large-scale video data; the subsequent finetuning stage adapts the pretrained encoder under an autoregressive video generation objective to establish content-level consistency. In this way, the lightweight embeddings achieve comparable performance to heavier alternatives. We evaluate the framework with ablative settings and discuss the architecture designs.
Topik & Kata Kunci
Penulis (9)
Lvmin Zhang
Shengqu Cai
Muyang Li
Chong Zeng
Beijia Lu
Anyi Rao
Song Han
Gordon Wetzstein
Maneesh Agrawala
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓