arXiv Open Access 2026

Scaling World Model for Hierarchical Manipulation Policies

Qian Long Yueze Wang Jiaxi Song Junbo Zhang Peiyan Li +11 lainnya
Lihat Sumber

Abstrak

Vision-Language-Action (VLA) models are promising for generalist robot manipulation but remain brittle in out-of-distribution (OOD) settings, especially with limited real-robot data. To resolve the generalization bottleneck, we introduce a hierarchical Vision-Language-Action framework \our{} that leverages the generalization of large-scale pre-trained world model for robust and generalizable VIsual Subgoal TAsk decomposition VISTA. Our hierarchical framework \our{} consists of a world model as the high-level planner and a VLA as the low-level executor. The high-level world model first divides manipulation tasks into subtask sequences with goal images, and the low-level policy follows the textual and visual guidance to generate action sequences. Compared to raw textual goal specification, these synthesized goal images provide visually and physically grounded details for low-level policies, making it feasible to generalize across unseen objects and novel scenarios. We validate both visual goal synthesis and our hierarchical VLA policies in massive out-of-distribution scenarios, and the performance of the same-structured VLA in novel scenarios could boost from 14% to 69% with the guidance generated by the world model. Results demonstrate that our method outperforms previous baselines with a clear margin, particularly in out-of-distribution scenarios. Project page: \href{https://vista-wm.github.io/}{https://vista-wm.github.io}

Topik & Kata Kunci

Penulis (16)

Q

Qian Long

Y

Yueze Wang

J

Jiaxi Song

J

Junbo Zhang

P

Peiyan Li

W

Wenxuan Wang

Y

Yuqi Wang

H

Haoyang Li

S

Shaoxuan Xie

G

Guocai Yao

H

Hanbo Zhang

X

Xinlong Wang

Z

Zhongyuan Wang

X

Xuguang Lan

H

Huaping Liu

X

Xinghang Li

Format Sitasi

Long, Q., Wang, Y., Song, J., Zhang, J., Li, P., Wang, W. et al. (2026). Scaling World Model for Hierarchical Manipulation Policies. https://arxiv.org/abs/2602.10983

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓