DOAJ Open Access 2025

Move to See More: Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection

Aoqi Wang Guohui Tian Yuhao Wang Zhongyang Li

Abstrak

ABSTRACT Active object detection (AOD) is a crucial task in the field of robotics. A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion, which leads to the failure of traditional methods. To address the occlusion problem, this paper first proposes a novel occlusion handling method based on the large multimodal model (LMM). The method utilises an LMM to detect and analyse input RGB images and generates adjustment actions to progressively eliminate occlusion. After the occlusion is handled, an improved AOD method based on a deep Q‐learning network (DQN) is used to complete the task. We introduce an attention mechanism to process image features, enabling the model to focus on critical regions of the input images. Additionally, a new reward function is proposed that comprehensively considers the bounding box of the target object and the robot's distance to the object, along with the actions performed by the robot. Experiments on the dataset and in real‐world scenarios validate the effectiveness of the proposed method in performing AOD tasks under partial occlusion.

Topik & Kata Kunci

Cybernetics Electronic computers. Computer science

Penulis (4)

Aoqi Wang

Guohui Tian

Yuhao Wang

Zhongyang Li

Format Sitasi

APA MLA BibTeX

Wang, A., Tian, G., Wang, Y., Li, Z. (2025). Move to See More: Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection. https://doi.org/10.1049/csy2.70008

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1049/csy2.70008

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.1049/csy2.70008
Akses: Open Access ✓