Semantic Scholar Open Access 2025 5 sitasi

Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States

Jurgita Kapovciut.e-Dzikien.e Toms Bergmanis Mārcis Pinnis

Abstrak

Although large language models (LLMs) have transformed our expectations of modern language technologies, concerns over data privacy often restrict the use of commercially available LLMs hosted outside of EU jurisdictions. This limits their application in governmental, defence, and other data-sensitive sectors. In this work, we evaluate the extent to which locally deployable open-weight LLMs support lesser-spoken languages such as Lithuanian, Latvian, and Estonian. We examine various size and precision variants of the top-performing multilingual open-weight models, Llama~3, Gemma~2, Phi, and NeMo, on machine translation, multiple-choice question answering, and free-form text generation. The results indicate that while certain models like Gemma~2 perform close to the top commercially available models, many LLMs struggle with these languages. Most surprisingly, however, we find that these models, while showing close to state-of-the-art translation performance, are still prone to lexical hallucinations with errors in at least 1 in 20 words for all open-weight multilingual LLMs.

Topik & Kata Kunci

Penulis (3)

J

Jurgita Kapovciut.e-Dzikien.e

T

Toms Bergmanis

M

Mārcis Pinnis

Format Sitasi

Kapovciut.e-Dzikien.e, J., Bergmanis, T., Pinnis, M. (2025). Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States. https://doi.org/10.48550/arXiv.2501.03952

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.48550/arXiv.2501.03952
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2501.03952
Akses
Open Access ✓