arXiv Open Access 2025

Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding

Lvmin Zhang Shengqu Cai Muyang Li Chong Zeng Beijia Lu +4 lainnya

Lihat Sumber

Abstrak

Autoregressive video generation relies on history context for content consistency and storytelling. As video histories grow longer, efficiently encoding them remains an open problem - particularly for personal users and local workflows where compute and memory budgets are limited. We present a lightweight history encoder that maps long video histories into short-length embeddings, pretrained with a frame query objective that learns to attend to content features at arbitrary temporal positions. The pretraining stage provides the encoder with dense history coverage on large-scale video data; the subsequent finetuning stage adapts the pretrained encoder under an autoregressive video generation objective to establish content-level consistency. In this way, the lightweight embeddings achieve comparable performance to heavier alternatives. We evaluate the framework with ablative settings and discuss the architecture designs.

Topik & Kata Kunci

cs.CV

Penulis (9)

Lvmin Zhang

Shengqu Cai

Muyang Li

Chong Zeng

Beijia Lu

Anyi Rao

Song Han

Gordon Wetzstein

Maneesh Agrawala

Format Sitasi

APA MLA BibTeX

Zhang, L., Cai, S., Li, M., Zeng, C., Lu, B., Rao, A. et al. (2025). Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding. https://arxiv.org/abs/2512.23851

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓