DOAJ Open Access 2025

Move to See More: Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection

Aoqi Wang Guohui Tian Yuhao Wang Zhongyang Li

Abstrak

ABSTRACT Active object detection (AOD) is a crucial task in the field of robotics. A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion, which leads to the failure of traditional methods. To address the occlusion problem, this paper first proposes a novel occlusion handling method based on the large multimodal model (LMM). The method utilises an LMM to detect and analyse input RGB images and generates adjustment actions to progressively eliminate occlusion. After the occlusion is handled, an improved AOD method based on a deep Q‐learning network (DQN) is used to complete the task. We introduce an attention mechanism to process image features, enabling the model to focus on critical regions of the input images. Additionally, a new reward function is proposed that comprehensively considers the bounding box of the target object and the robot's distance to the object, along with the actions performed by the robot. Experiments on the dataset and in real‐world scenarios validate the effectiveness of the proposed method in performing AOD tasks under partial occlusion.

Penulis (4)

A

Aoqi Wang

G

Guohui Tian

Y

Yuhao Wang

Z

Zhongyang Li

Format Sitasi

Wang, A., Tian, G., Wang, Y., Li, Z. (2025). Move to See More: Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection. https://doi.org/10.1049/csy2.70008

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1049/csy2.70008
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.1049/csy2.70008
Akses
Open Access ✓