最近的邻居知识蒸馏神经机器翻译

论文标题

最近的邻居知识蒸馏神经机器翻译

Nearest Neighbor Knowledge Distillation for Neural Machine Translation

论文作者

Yang, Zhixian, Sun, Renliang, Wan, Xiaojun

论文摘要

Khandelwal等人提出的K-Nearest-Neighbor机器翻译（NN-MT）。（2021），已经在机器翻译任务中实现了许多最新的结果。尽管有效，但NN-MT要求在推理过程中对每个解码步骤进行NN进行NN搜索，从而极大地增加解码成本，从而导致在现实世界应用程序中部署的困难。在本文中，我们建议将耗时的NN搜索向前移动到预处理阶段，然后引入最近的邻居知识蒸馏（NN-KD），该探索（NN-KD）训练基本NMT模型以直接学习NN的知识。 NN检索到的知识可以鼓励NMT模型考虑更合理的目标令牌，从而解决过度纠正问题。广泛的实验结果表明，所提出的方法可以在包括NN-MT在内的最新基准相比，在保持与标准NMT模型相同的训练和解码速度的同时，取得了一致的改进。

k-nearest-neighbor machine translation (NN-MT), proposed by Khandelwal et al. (2021), has achieved many state-of-the-art results in machine translation tasks. Although effective, NN-MT requires conducting NN searches through the large datastore for each decoding step during inference, prohibitively increasing the decoding cost and thus leading to the difficulty for the deployment in real-world applications. In this paper, we propose to move the time-consuming NN search forward to the preprocessing phase, and then introduce Nearest Neighbor Knowledge Distillation (NN-KD) that trains the base NMT model to directly learn the knowledge of NN. Distilling knowledge retrieved by NN can encourage the NMT model to take more reasonable target tokens into consideration, thus addressing the overcorrection problem. Extensive experimental results show that, the proposed method achieves consistent improvement over the state-of-the-art baselines including NN-MT, while maintaining the same training and decoding speed as the standard NMT model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题