arXiv Open Access 2025

3CEL: A corpus of legal Spanish contract clauses

Nuria Aldama García Patricia Marsà Morales David Betancur Sánchez Álvaro Barbero Jiménez Marta Guerrero Nieto +3 lainnya
Lihat Sumber

Abstrak

Legal corpora for Natural Language Processing (NLP) are valuable and scarce resources in languages like Spanish due to two main reasons: data accessibility and legal expert knowledge availability. INESData 2024 is a European Union funded project lead by the Universidad Politécnica de Madrid (UPM) and developed by Instituto de Ingeniería del Conocimiento (IIC) to create a series of state-of-the-art NLP resources applied to the legal/administrative domain in Spanish. The goal of this paper is to present the Corpus of Legal Spanish Contract Clauses (3CEL), which is a contract information extraction corpus developed within the framework of INESData 2024. 3CEL contains 373 manually annotated tenders using 19 defined categories (4 782 total tags) that identify key information for contract understanding and reviewing.

Topik & Kata Kunci

Penulis (8)

N

Nuria Aldama García

P

Patricia Marsà Morales

D

David Betancur Sánchez

Á

Álvaro Barbero Jiménez

M

Marta Guerrero Nieto

P

Pablo Haya Coll

P

Patricia Martín Chozas

E

Elena Montiel Ponsoda

Format Sitasi

García, N.A., Morales, P.M., Sánchez, D.B., Jiménez, Á.B., Nieto, M.G., Coll, P.H. et al. (2025). 3CEL: A corpus of legal Spanish contract clauses. https://arxiv.org/abs/2501.15990

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓