arXiv Open Access 2020

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

Mitchell A. Gordon Kevin Duh

Lihat Sumber

Abstrak

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.

Topik & Kata Kunci

cs.CL

Penulis (2)

Mitchell A. Gordon

Kevin Duh

Format Sitasi

APA MLA BibTeX

Gordon, M.A., Duh, K. (2020). Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation. https://arxiv.org/abs/2003.02877

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2020
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓