论文标题
一个参考不够
One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation
论文作者
论文摘要
非自动性神经机器翻译(NAT)遇到了多模式问题:源句子可能具有多个正确的翻译,但是仅根据参考句子计算损耗函数。序列级知识蒸馏通过用自回归模型的输出代替目标,从而使目标更具确定性。但是,蒸馏数据集中的多模式问题仍然不可忽略。此外,从特定的教师那里学习限制了模型能力的上限,从而限制了NAT模型的潜力。在本文中,我们认为一个参考不足,并提出了NAT参考选择(DDR)的多种蒸馏。具体而言,我们首先提出了一种用于不同机器翻译的称为SeedDiv的方法,它使我们能够生成一个包含每个源句子的多个高质量参考翻译的数据集。在培训期间,我们将NAT输出与所有参考进行比较,并选择最适合NAT输出以训练模型的参考。广泛使用的机器翻译基准的实验证明了DDR的有效性,DDR的有效性仅在WMT14 EN-DE上仅进行了一个解码,从而达到了29.82 BLEU,从而将NAT的最先进性能提高了1个BLEU。源代码:https://github.com/ictnlp/ddrs-nat
Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence. Sequence-level knowledge distillation makes the target more deterministic by replacing the target with the output from an autoregressive model. However, the multi-modality problem in the distilled dataset is still nonnegligible. Furthermore, learning from a specific teacher limits the upper bound of the model capability, restricting the potential of NAT models. In this paper, we argue that one reference is not enough and propose diverse distillation with reference selection (DDRS) for NAT. Specifically, we first propose a method called SeedDiv for diverse machine translation, which enables us to generate a dataset containing multiple high-quality reference translations for each source sentence. During the training, we compare the NAT output with all references and select the one that best fits the NAT output to train the model. Experiments on widely-used machine translation benchmarks demonstrate the effectiveness of DDRS, which achieves 29.82 BLEU with only one decoding pass on WMT14 En-De, improving the state-of-the-art performance for NAT by over 1 BLEU. Source code: https://github.com/ictnlp/DDRS-NAT