arXiv Open Access 2024

OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

Li Meng Zhao Qi Lyu Shuchang Wang Chunlei Ma Yujing +2 lainnya
Lihat Sumber

Abstrak

Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71.2\% and 64.4\% on base and novel categories in our new dataset, respectively.

Topik & Kata Kunci

Penulis (7)

L

Li Meng

Z

Zhao Qi

L

Lyu Shuchang

W

Wang Chunlei

M

Ma Yujing

C

Cheng Guangliang

Y

Yang Chenguang

Format Sitasi

Meng, L., Qi, Z., Shuchang, L., Chunlei, W., Yujing, M., Guangliang, C. et al. (2024). OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping. https://arxiv.org/abs/2407.13175

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓