论文标题

地图解码您需要吗?在神经机器翻译中模式的不足

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

论文作者

Eikema, Bryan, Aziz, Wilker

论文摘要

最近的研究揭示了许多神经机器翻译(NMT)系统的病理。假设解释这些主要表明NMT作为模型或其训练算法,最大似然估计(MLE)有一些错误。这些证据的大部分是使用最大后验(MAP)解码来收集的,后者(MAP)解码旨在识别得分最高的翻译,即模式。我们认为,证据证实了地图解码的不足,而不是对模型及其训练算法的怀疑。在这项工作中,我们表明翻译分布确实可以很好地再现数据的各种统计数据,但是从此类统计数据中进行了梁搜索流浪。我们表明,NMT的一些已知病理和偏见是由于MAP解码所致,而不是NMT的统计假设或MLE。特别是,我们表明该模型下最有可能的翻译累积的概率质量很小,以至于该模式可以被视为本质上是任意的。因此,我们主张使用决策规则,这些规则可以全面考虑翻译分布。我们表明,与最小贝叶斯风险解码的近似值可以确认NMT模型确实可以很好地捕获转换的重要方面。

Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we show that translation distributions do reproduce various statistics of the data well, but that beam search strays from such statistics. We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account the translation distribution holistically. We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源