Move to See More: Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection
Abstrak
ABSTRACT Active object detection (AOD) is a crucial task in the field of robotics. A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion, which leads to the failure of traditional methods. To address the occlusion problem, this paper first proposes a novel occlusion handling method based on the large multimodal model (LMM). The method utilises an LMM to detect and analyse input RGB images and generates adjustment actions to progressively eliminate occlusion. After the occlusion is handled, an improved AOD method based on a deep Q‐learning network (DQN) is used to complete the task. We introduce an attention mechanism to process image features, enabling the model to focus on critical regions of the input images. Additionally, a new reward function is proposed that comprehensively considers the bounding box of the target object and the robot's distance to the object, along with the actions performed by the robot. Experiments on the dataset and in real‐world scenarios validate the effectiveness of the proposed method in performing AOD tasks under partial occlusion.
Topik & Kata Kunci
Penulis (4)
Aoqi Wang
Guohui Tian
Yuhao Wang
Zhongyang Li
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1049/csy2.70008
- Akses
- Open Access ✓