arXiv Open Access 2025

QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal Pairs

David Beauchemin Pier-Luc Veilleux Johanna-Pascale Roy Richard Khoury
Lihat Sumber

Abstrak

In this paper, we introduce the Quebec-French Benchmark of Linguistic Minimal Pairs (QFrBLiMP), a corpus designed to evaluate LLMs' linguistic knowledge of prominent grammatical phenomena in Quebec-French. QFrBLiMP comprises 1,761 minimal pairs annotated with 20 LPs. Specifically, these minimal pairs have been created by manually modifying sentences extracted from an official online resource maintained by a Québec government institution. Each pair is annotated by 12 Quebec-French native speakers, who select the sentence they consider grammatical from the two. These annotations are used to compare the competency of LLMs with that of humans. We evaluate different LLMs on QFrBLiMP and MultiBLiMP-Fr by observing the rate of higher probabilities assigned to the sentences of each minimal pair for each category. We find that while grammatical competence scales with model size, a clear hierarchy of difficulty emerges. All benchmarked models consistently fail on phenomena requiring deep semantic understanding, revealing a critical limitation. Finally, our statistical analysis comparing QFrBLiMP and MultiBLiMP reveals a significant performance degradation for most models on Quebec-French; however, the most capable models remain within the statistical significance interval, demonstrating cross-dialectal robustness.

Topik & Kata Kunci

Penulis (4)

D

David Beauchemin

P

Pier-Luc Veilleux

J

Johanna-Pascale Roy

R

Richard Khoury

Format Sitasi

Beauchemin, D., Veilleux, P., Roy, J., Khoury, R. (2025). QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal Pairs. https://arxiv.org/abs/2509.25664

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓