arXiv Open Access 2019

A Corpus for Automatic Readability Assessment and Text Simplification of German

Alessia Battisti Sarah Ebling
Lihat Sumber

Abstrak

In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification of German. The corpus is compiled from web sources and consists of approximately 211,000 sentences. As a novel contribution, it contains information on text structure, typography, and images, which can be exploited as part of machine learning approaches to readability assessment and text simplification. The focus of this publication is on representing such information as an extension to an existing corpus standard.

Topik & Kata Kunci

Penulis (2)

A

Alessia Battisti

S

Sarah Ebling

Format Sitasi

Battisti, A., Ebling, S. (2019). A Corpus for Automatic Readability Assessment and Text Simplification of German. https://arxiv.org/abs/1909.09067

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2019
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓