arXiv Open Access 2024

OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

Li Meng Zhao Qi Lyu Shuchang Wang Chunlei Ma Yujing +2 lainnya

Lihat Sumber

Abstrak

Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71.2\% and 64.4\% on base and novel categories in our new dataset, respectively.

Topik & Kata Kunci

cs.RO

Penulis (7)

Li Meng

Zhao Qi

Lyu Shuchang

Wang Chunlei

Ma Yujing

Cheng Guangliang

Yang Chenguang

Format Sitasi

APA MLA BibTeX

Meng, L., Qi, Z., Shuchang, L., Chunlei, W., Yujing, M., Guangliang, C. et al. (2024). OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping. https://arxiv.org/abs/2407.13175

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓