DOAJ Open Access 2021

Data Curation in Practice: Extract Tabular Data from PDF Files Using a Data Analytics Tool

Allis J. Choi Xuying Xin

Abstrak

Data curation is the process of managing data to make it available for reuse and preservation and to allow FAIR (findable, accessible, interoperable, reusable) uses. It is an important part of the research lifecycle as researchers are often either required by funders or generally encouraged to preserve the dataset and make it discoverable and reusable. This has been especially important as the Open Access (OA) policy is being implemented in many institutions across the nation. In facilitating research data discovery and enhancing its easier reuse, an efficient data repository and its data curation play key roles. In this article, we briefly discuss the local institutional repository at Penn State University and the general data curation practices we adopt for the deposited files and datasets, then we focus on a data analytics tool that has recently been applied to extract tabular data from PDF files. This is an enhancement to the existing data curation practices as it adds additional tabular data to deposits with PDF files where tables are often embedded and not easily reused.

Penulis (2)

A

Allis J. Choi

X

Xuying Xin

Format Sitasi

Choi, A.J., Xin, X. (2021). Data Curation in Practice: Extract Tabular Data from PDF Files Using a Data Analytics Tool. https://doi.org/10.7191/jeslib.2021.1209

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.7191/jeslib.2021.1209
Informasi Jurnal
Tahun Terbit
2021
Sumber Database
DOAJ
DOI
10.7191/jeslib.2021.1209
Akses
Open Access ✓