arXiv Open Access 2023

Explaining Vision and Language through Graphs of Events in Space and Time

Mihai Masala Nicolae Cudlenco Traian Rebedea Marius Leordeanu

Lihat Sumber

Abstrak

Artificial Intelligence makes great advances today and starts to bridge the gap between vision and language. However, we are still far from understanding, explaining and controlling explicitly the visual content from a linguistic perspective, because we still lack a common explainable representation between the two domains. In this work we come to address this limitation and propose the Graph of Events in Space and Time (GEST), by which we can represent, create and explain, both visual and linguistic stories. We provide a theoretical justification of our model and an experimental validation, which proves that GEST can bring a solid complementary value along powerful deep learning models. In particular, GEST can help improve at the content-level the generation of videos from text, by being easily incorporated into our novel video generation engine. Additionally, by using efficient graph matching techniques, the GEST graphs can also improve the comparisons between texts at the semantic level.

Topik & Kata Kunci

cs.AI cs.CL cs.CV

Penulis (4)

Mihai Masala

Nicolae Cudlenco

Traian Rebedea

Marius Leordeanu

Format Sitasi

APA MLA BibTeX

Masala, M., Cudlenco, N., Rebedea, T., Leordeanu, M. (2023). Explaining Vision and Language through Graphs of Events in Space and Time. https://arxiv.org/abs/2309.08612

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓