Semantic Scholar Open Access 2024 5 sitasi

Summary of the Visually Grounded Story Generation Challenge

Xudong Hong Asad Sayeed Vera Demberg

Abstrak

Recent advancements in vision-and-language models have opened new possibilities for nat-ural language generation, particularly in generating creative stories from visual input. We thus host an open-sourced shared task, Visually Grounded Story Generation (VGSG), to explore whether these models can create coherent, diverse, and visually grounded narratives. This task challenges participants to generate coherent stories based on sequences of images, where characters and events must be grounded in the images provided. The task is structured into two tracks: the Closed track with constraints on fixed visual features and the Open track which allows all kinds of models. We propose the first two-stage model using GPT-4o as the baseline for the Open track that first generates descriptions for the images and then creates a story based on those descriptions. Human and automatic evaluations indicate that: 1) Retrieval augmentation helps generate more human-like stories, and 2) Large-scale pre-trained LLM improves story quality by a large margin; 3) Traditional automatic metrics can not capture the overall quality. 1

Penulis (3)

X

Xudong Hong

A

Asad Sayeed

V

Vera Demberg

Format Sitasi

Hong, X., Sayeed, A., Demberg, V. (2024). Summary of the Visually Grounded Story Generation Challenge. https://doi.org/10.18653/v1/2024.inlg-genchal.3

Akses Cepat

Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.18653/v1/2024.inlg-genchal.3
Akses
Open Access ✓