Semantic Scholar Open Access 2024 5 sitasi

Summary of the Visually Grounded Story Generation Challenge

Xudong Hong Asad Sayeed Vera Demberg

Abstrak

Recent advancements in vision-and-language models have opened new possibilities for nat-ural language generation, particularly in generating creative stories from visual input. We thus host an open-sourced shared task, Visually Grounded Story Generation (VGSG), to explore whether these models can create coherent, diverse, and visually grounded narratives. This task challenges participants to generate coherent stories based on sequences of images, where characters and events must be grounded in the images provided. The task is structured into two tracks: the Closed track with constraints on fixed visual features and the Open track which allows all kinds of models. We propose the first two-stage model using GPT-4o as the baseline for the Open track that first generates descriptions for the images and then creates a story based on those descriptions. Human and automatic evaluations indicate that: 1) Retrieval augmentation helps generate more human-like stories, and 2) Large-scale pre-trained LLM improves story quality by a large margin; 3) Traditional automatic metrics can not capture the overall quality. 1

Penulis (3)

Xudong Hong

Asad Sayeed

Vera Demberg

Format Sitasi

APA MLA BibTeX

Hong, X., Sayeed, A., Demberg, V. (2024). Summary of the Visually Grounded Story Generation Challenge. https://doi.org/10.18653/v1/2024.inlg-genchal.3

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.18653/v1/2024.inlg-genchal.3

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Total Sitasi: 5×
Sumber Database: Semantic Scholar
DOI: 10.18653/v1/2024.inlg-genchal.3
Akses: Open Access ✓