DOAJ Open Access 2025

CVCPSG: Discovering Composite Visual Clues for Panoptic Scene Graph Generation

Nanhao Liang Xiaoyuan Yang Yingwei Xia Yong Liu

Abstrak

Abstract Panoptic Scene Graph Generation (PSG) aims to segment objects and predict the relation triplets <subject, relation, object> within an image. Despite the impressive achievements in PSG, current methods still struggle to capture fine-grained visual context, eschewing spatial and situational information in favor of visual features related to object identity. This limitation naturally impedes the model’s ability to distinguish subtle visual differences between relation triplets, such as “cat-on-person” and “cat-lying on-person”. To address this challenge, we propose CVCPSG, a novel DETR-based method that uncovers composite visual clues for PSG. Specifically, drawing inspiration from how humans capture visual context using diverse visual clues, we first construct a composite visual clues bank based on three key aspects: object, spatial, and situational. Then, we introduce a multi-level visual extractor to align visual features from objects, interactions, and image levels with the composite visual clues bank. Additionally, we incorporate a cross-modal learning module with a multitower architecture to seamlessly integrate visual clues into the relation decoder, thereby improving PSG detection. Extensive experiments on two PSG benchmarks confirm the effectiveness and interpretability of CVCPSG.

Topik & Kata Kunci

Electronic computers. Computer science

Penulis (4)

Nanhao Liang

Xiaoyuan Yang

Yingwei Xia

Yong Liu

Format Sitasi

APA MLA BibTeX

Liang, N., Yang, X., Xia, Y., Liu, Y. (2025). CVCPSG: Discovering Composite Visual Clues for Panoptic Scene Graph Generation. https://doi.org/10.1007/s44443-025-00063-w

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1007/s44443-025-00063-w

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.1007/s44443-025-00063-w
Akses: Open Access ✓