arXiv Open Access 2024

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven Aliaksandr Siarohin Sergey Tulyakov Philip Torr Fabio Pizzati
Lihat Sumber

Abstrak

We propose DiTFlow, a method for transferring the motion of a reference video to a newly synthesized one, designed specifically for Diffusion Transformers (DiT). We first process the reference video with a pre-trained DiT to analyze cross-frame attention maps and extract a patch-wise motion signal called the Attention Motion Flow (AMF). We guide the latent denoising process in an optimization-based, training-free, manner by optimizing latents with our AMF loss to generate videos reproducing the motion of the reference one. We also apply our optimization strategy to transformer positional embeddings, granting us a boost in zero-shot motion transfer capabilities. We evaluate DiTFlow against recently published methods, outperforming all across multiple metrics and human evaluation.

Topik & Kata Kunci

Penulis (5)

A

Alexander Pondaven

A

Aliaksandr Siarohin

S

Sergey Tulyakov

P

Philip Torr

F

Fabio Pizzati

Format Sitasi

Pondaven, A., Siarohin, A., Tulyakov, S., Torr, P., Pizzati, F. (2024). Video Motion Transfer with Diffusion Transformers. https://arxiv.org/abs/2412.07776

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓