Semantic Scholar Open Access 2024 5 sitasi

FoRC4CL: A Fine-grained Field of Research Classification and Annotated Dataset of NLP Articles

Raia Abu Ahmad E. Borisova Georg Rehm

Abstrak

The steep increase in the number of scholarly publications has given rise to various digital repositories, libraries and knowledge graphs aimed to capture, manage, and preserve scientific data. Efficiently navigating such databases requires a system able to classify scholarly documents according to the respective research (sub-)field. However, not every digital repository possesses a relevant classification schema for categorising publications. For instance, one of the largest digital archives in Computational Linguistics (CL) and Natural Language Processing (NLP), the ACL Anthology, lacks a system for classifying papers into topics and sub-topics. This paper addresses this gap by constructing a corpus of 1,500 ACL Anthology publications annotated with their main contributions using a novel hierarchical taxonomy of core CL/NLP topics and sub-topics. The corpus is used in a shared task with the goal of classifying CL/NLP papers into their respective sub-topics.

Topik & Kata Kunci

Penulis (3)

R

Raia Abu Ahmad

E

E. Borisova

G

Georg Rehm

Format Sitasi

Ahmad, R.A., Borisova, E., Rehm, G. (2024). FoRC4CL: A Fine-grained Field of Research Classification and Annotated Dataset of NLP Articles. https://doi.org/10.63317/3bhrpiumak6i

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.63317/3bhrpiumak6i
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.63317/3bhrpiumak6i
Akses
Open Access ✓