arXiv Open Access 2023

Sentence Simplification Using Paraphrase Corpus for Initialization

Kang Liu Jipeng Qiang
Lihat Sumber

Abstrak

Neural sentence simplification method based on sequence-to-sequence framework has become the mainstream method for sentence simplification (SS) task. Unfortunately, these methods are currently limited by the scarcity of parallel SS corpus. In this paper, we focus on how to reduce the dependence on parallel corpus by leveraging a careful initialization for neural SS methods from paraphrase corpus. Our work is motivated by the following two findings: (1) Paraphrase corpus includes a large proportion of sentence pairs belonging to SS corpus. (2) We can construct large-scale pseudo parallel SS data by keeping these sentence pairs with a higher complexity difference. Therefore, we propose two strategies to initialize neural SS methods using paraphrase corpus. We train three different neural SS methods with our initialization, which can obtain substantial improvements on the available WikiLarge data compared with themselves without initialization.

Topik & Kata Kunci

Penulis (2)

K

Kang Liu

J

Jipeng Qiang

Format Sitasi

Liu, K., Qiang, J. (2023). Sentence Simplification Using Paraphrase Corpus for Initialization. https://arxiv.org/abs/2305.19754

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓