3CEL: A corpus of legal Spanish contract clauses
Abstrak
Legal corpora for Natural Language Processing (NLP) are valuable and scarce resources in languages like Spanish due to two main reasons: data accessibility and legal expert knowledge availability. INESData 2024 is a European Union funded project lead by the Universidad Politécnica de Madrid (UPM) and developed by Instituto de Ingeniería del Conocimiento (IIC) to create a series of state-of-the-art NLP resources applied to the legal/administrative domain in Spanish. The goal of this paper is to present the Corpus of Legal Spanish Contract Clauses (3CEL), which is a contract information extraction corpus developed within the framework of INESData 2024. 3CEL contains 373 manually annotated tenders using 19 defined categories (4 782 total tags) that identify key information for contract understanding and reviewing.
Topik & Kata Kunci
Penulis (8)
Nuria Aldama García
Patricia Marsà Morales
David Betancur Sánchez
Álvaro Barbero Jiménez
Marta Guerrero Nieto
Pablo Haya Coll
Patricia Martín Chozas
Elena Montiel Ponsoda
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓