论文标题

标签平滑在光束搜索解码上的隐式长度偏差

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

论文作者

Liang, Bowen, Wang, Pidong, Cao, Yuan

论文摘要

标签平滑性无处不在地应用于神经机器翻译(NMT)训练中。虽然标签平滑在模型训练期间提供了所需的正则化效果,但在本文中,我们证明了它在光束搜索解码过程中仍引入了长度偏见。我们的分析表明,标签平滑隐式将长度惩罚项隐含地应用于输出序列,从而导致偏差对较短的翻译。我们还表明,对于具有标签平滑的完全优化的模型,翻译长度隐含地上是由固定常数独立于输入的界限。我们通过在推理时间应用简单的整流函数来验证我们的理论,以恢复标签平滑模型预测的无偏分布。这种纠正方法导致了WMT英语 - 德国,英语,英语,英语和英语 - 英语任务的一致质量提高,最高+0.3 bleu,梁尺寸为4和+2.8 bleu,beain size尺寸为200。

Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training. While label smoothing offers a desired regularization effect during model training, in this paper we demonstrate that it nevertheless introduces length biases in the beam search decoding procedure. Our analysis shows that label smoothing implicitly applies a length penalty term to output sequence, causing a bias towards shorter translations. We also show that for a model fully optimized with label smoothing, translation length is implicitly upper bounded by a fixed constant independent of input. We verify our theory by applying a simple rectification function at inference time to restore the unbiased distributions from the label-smoothed model predictions. This rectification method led to consistent quality improvements on WMT English-German, English-French, English-Czech and English-Chinese tasks, up to +0.3 BLEU at beam size 4 and +2.8 BLEU at beam size 200.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源