DOAJ Open Access 2025

BITCC: A Bidirectional Image–Text Interaction Method for High-Resolution Remote Sensing Image Change Captioning

Yingjie Tang Shou Feng Yongqi Chen Jinghe Zhang Nan Su +1 lainnya

Abstrak

High-resolution remote sensing image change captioning (RSICC) aims to understand the change content in bitemporal high-resolution remote sensing images and generate corresponding descriptive captions. By presenting change information in the form of natural language, it makes the information more intuitive and easier to communicate, which has garnered widespread attention. However, there are still two challenges in RSICC: First, most existing methods adopt a unidirectional interaction from images to text, resulting in insufficient semantic alignment between images and text, which limits method's performance. Second, in remote sensing images, there are interfering factors such as illumination and climate, leading to overall differences between bitemporal images, which affect the recognition of change information. To address the aforementioned challenges, this article proposes a bidirectional image–text interaction method for high-resolution RSICC (BITCC). BITCC first introduces the image-to-text interaction component based on reconstruction. This approach along with the caption generation component, forms a bidirectional interaction to enhance the semantic correlation between the local change information of the images and the textual information. To address the issue of global discrepancies between bitemporal images, a noise-based change extractor is designed, which reduces the model's focus on irrelevant factors by adding noise. Finally, the images-and-text interaction component constrains the global representations of both modalities through contrastive alignment, enhancing the global semantic consistency between the image and text in the high-level representation. Experiments on two public datasets show that our method outperforms the current state-of-the-art methods.

Penulis (6)

Y

Yingjie Tang

S

Shou Feng

Y

Yongqi Chen

J

Jinghe Zhang

N

Nan Su

C

Chunhui Zhao

Format Sitasi

Tang, Y., Feng, S., Chen, Y., Zhang, J., Su, N., Zhao, C. (2025). BITCC: A Bidirectional Image–Text Interaction Method for High-Resolution Remote Sensing Image Change Captioning. https://doi.org/10.1109/JSTARS.2025.3629158

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1109/JSTARS.2025.3629158
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.1109/JSTARS.2025.3629158
Akses
Open Access ✓