arXiv Open Access 2024

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

Zixian Huang Wenhao Zhu Gong Cheng Lei Li Fei Yuan
Lihat Sumber

Abstrak

Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MindMerger consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6.7% and 8.0% across all languages and low-resource languages on the MGSM dataset, respectively.

Topik & Kata Kunci

Penulis (5)

Z

Zixian Huang

W

Wenhao Zhu

G

Gong Cheng

L

Lei Li

F

Fei Yuan

Format Sitasi

Huang, Z., Zhu, W., Cheng, G., Li, L., Yuan, F. (2024). MindMerger: Efficient Boosting LLM Reasoning in non-English Languages. https://arxiv.org/abs/2405.17386

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓