arXiv Open Access 2025

DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking

Lanni Bu Lauren Levine Amir Zeldes
Lihat Sumber

Abstrak

Recent LLM benchmarks have tested models on a range of phenomena, but are still focused primarily on natural language understanding for extraction of explicit information, such as QA or summarization, with responses often targeting information from individual sentences. We are still lacking more challenging, and importantly also multilingual, benchmarks focusing on implicit information and pragmatic inferences across larger documents in the context of discourse tracking: integrating and aggregating information across sentences, paragraphs and multiple speaker utterances. To this end, we present DiscoTrack, an LLM benchmark targeting a range of tasks across 12 languages and four levels of discourse understanding: salience recognition, entity tracking, discourse relations and bridging inference. Our evaluation shows that these tasks remain challenging, even for state-of-the-art models.

Topik & Kata Kunci

Penulis (3)

L

Lanni Bu

L

Lauren Levine

A

Amir Zeldes

Format Sitasi

Bu, L., Levine, L., Zeldes, A. (2025). DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking. https://arxiv.org/abs/2510.17013

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓