arXiv Open Access 2026

WorldCompass: Reinforcement Learning for Long-Horizon World Models

Zehan Wang Tengfei Wang Haiyu Zhang Xuhui Zuo Junta Wu +7 lainnya

Lihat Sumber

Abstrak

This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.

Topik & Kata Kunci

cs.CV

Penulis (12)

Zehan Wang

Tengfei Wang

Haiyu Zhang

Xuhui Zuo

Junta Wu

Haoyuan Wang

Wenqiang Sun

Zhenwei Wang

Chenjie Cao

Hengshuang Zhao

Chunchao Guo

Zhou Zhao

Format Sitasi

APA MLA BibTeX

Wang, Z., Wang, T., Zhang, H., Zuo, X., Wu, J., Wang, H. et al. (2026). WorldCompass: Reinforcement Learning for Long-Horizon World Models. https://arxiv.org/abs/2602.09022

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2026
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓