Semantic Scholar Open Access 2025 8 sitasi

ITALIC: An Italian Culture-Aware Natural Language Benchmark

Andrea Seveso Daniele Potertì Edoardo Federici Mario Mezzanzanica Fabio Mercorio

Abstrak

We present ITALIC 1 , a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies. ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. We establish baseline performances using 17 state-of-the-art LLMs, revealing current limitations in Italian language understanding and highlighting significant linguistic complexity and cultural specificity challenges. ITALIC serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware nat-ural language systems.

Topik & Kata Kunci

Penulis (5)

A

Andrea Seveso

D

Daniele Potertì

E

Edoardo Federici

M

Mario Mezzanzanica

F

Fabio Mercorio

Format Sitasi

Seveso, A., Potertì, D., Federici, E., Mezzanzanica, M., Mercorio, F. (2025). ITALIC: An Italian Culture-Aware Natural Language Benchmark. https://doi.org/10.18653/v1/2025.naacl-long.68

Akses Cepat

Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.18653/v1/2025.naacl-long.68
Akses
Open Access ✓