arXiv Open Access 2021

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

Ziyi Yang Yinfei Yang Daniel Cer Eric Darve

Lihat Sumber

Abstrak

Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment shows that LIR is highly effectively on this task, yielding almost 100% relative improvement in MAP for weak-alignment models. We then evaluate the LIR on Amazon Reviews and XEVAL dataset, with the observation that removing language information is able to improve the cross-lingual transfer performance.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (4)

Ziyi Yang

Yinfei Yang

Daniel Cer

Eric Darve

Format Sitasi

APA MLA BibTeX

Yang, Z., Yang, Y., Cer, D., Darve, E. (2021). A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations. https://arxiv.org/abs/2109.04727

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓