arXiv Open Access 2025

X-MoGen: Unified Motion Generation across Humans and Animals

Xuan Wang Kai Ruan Liyang Qian Zhizhi Guo Chang Su +1 lainnya
Lihat Sumber

Abstrak

Text-driven motion generation has attracted increasing attention due to its broad applications in virtual reality, animation, and robotics. While existing methods typically model human and animal motion separately, a joint cross-species approach offers key advantages, such as a unified representation and improved generalization. However, morphological differences across species remain a key challenge, often compromising motion plausibility. To address this, we propose X-MoGen, the first unified framework for cross-species text-driven motion generation covering both humans and animals. X-MoGen adopts a two-stage architecture. First, a conditional graph variational autoencoder learns canonical T-pose priors, while an autoencoder encodes motion into a shared latent space regularized by morphological loss. In the second stage, we perform masked motion modeling to generate motion embeddings conditioned on textual descriptions. During training, a morphological consistency module is employed to promote skeletal plausibility across species. To support unified modeling, we construct UniMo4D, a large-scale dataset of 115 species and 119k motion sequences, which integrates human and animal motions under a shared skeletal topology for joint training. Extensive experiments on UniMo4D demonstrate that X-MoGen outperforms state-of-the-art methods on both seen and unseen species.

Topik & Kata Kunci

Penulis (6)

X

Xuan Wang

K

Kai Ruan

L

Liyang Qian

Z

Zhizhi Guo

C

Chang Su

G

Gaoang Wang

Format Sitasi

Wang, X., Ruan, K., Qian, L., Guo, Z., Su, C., Wang, G. (2025). X-MoGen: Unified Motion Generation across Humans and Animals. https://arxiv.org/abs/2508.05162

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓