Semantic Scholar Open Access 2025 16 sitasi

M-IFEval: Multilingual Instruction-Following Evaluation

Antoine Dussolle Andrea Cardena D'iaz Shota Sato Peter Devine

Abstrak

Instruction following is a core capability of modern Large language models (LLMs), making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages. We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.

Topik & Kata Kunci

Computer Science

Penulis (4)

Antoine Dussolle

Andrea Cardena D'iaz

Shota Sato

Peter Devine

Format Sitasi

APA MLA BibTeX

Dussolle, A., D'iaz, A.C., Sato, S., Devine, P. (2025). M-IFEval: Multilingual Instruction-Following Evaluation. https://doi.org/10.48550/arXiv.2502.04688

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.48550/arXiv.2502.04688

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Total Sitasi: 16×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2502.04688
Akses: Open Access ✓