arXiv Open Access 2025

Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding

Lvmin Zhang Shengqu Cai Muyang Li Chong Zeng Beijia Lu +4 lainnya
Lihat Sumber

Abstrak

Autoregressive video generation relies on history context for content consistency and storytelling. As video histories grow longer, efficiently encoding them remains an open problem - particularly for personal users and local workflows where compute and memory budgets are limited. We present a lightweight history encoder that maps long video histories into short-length embeddings, pretrained with a frame query objective that learns to attend to content features at arbitrary temporal positions. The pretraining stage provides the encoder with dense history coverage on large-scale video data; the subsequent finetuning stage adapts the pretrained encoder under an autoregressive video generation objective to establish content-level consistency. In this way, the lightweight embeddings achieve comparable performance to heavier alternatives. We evaluate the framework with ablative settings and discuss the architecture designs.

Topik & Kata Kunci

Penulis (9)

L

Lvmin Zhang

S

Shengqu Cai

M

Muyang Li

C

Chong Zeng

B

Beijia Lu

A

Anyi Rao

S

Song Han

G

Gordon Wetzstein

M

Maneesh Agrawala

Format Sitasi

Zhang, L., Cai, S., Li, M., Zeng, C., Lu, B., Rao, A. et al. (2025). Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding. https://arxiv.org/abs/2512.23851

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓