arXiv Open Access 2023

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Longyue Wang Zefeng Du Donghuai Liu Deng Cai Dian Yu +5 lainnya
Lihat Sumber

Abstrak

Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench.

Topik & Kata Kunci

Penulis (10)

L

Longyue Wang

Z

Zefeng Du

D

Donghuai Liu

D

Deng Cai

D

Dian Yu

H

Haiyun Jiang

Y

Yan Wang

L

Leyang Cui

S

Shuming Shi

Z

Zhaopeng Tu

Format Sitasi

Wang, L., Du, Z., Liu, D., Cai, D., Yu, D., Jiang, H. et al. (2023). Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling. https://arxiv.org/abs/2307.08074

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓