arXiv
Open Access
2020
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Mitchell A. Gordon
Kevin Duh
Abstrak
We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.
Topik & Kata Kunci
Penulis (2)
M
Mitchell A. Gordon
K
Kevin Duh
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2020
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓