论文标题

调查最大似然序列模型的解码器:一种审视方法

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

论文作者

Wang, Yu-Siang, Kuo, Yen-Ling, Katz, Boris

论文摘要

我们演示了如何实际将多步骤的未来信息整合到最大似然序列模型的解码器中。我们建议一个“ k-step look-ap-tead”模块,以考虑推出至k步的可能性信息。与需要训练另一个值网络以评估推出的其他方法不同,我们可以直接应用此图片模块来改善在最大似然框架中训练的任何序列模型的解码。我们在三个不同困难的数据集上评估了我们的前景模块:IM2LATEX-100K OCR图像到乳胶,WMT16多模式机器翻译和WMT14机器翻译。我们的前景模块改善了更简单的数据集的性能,例如IM2LATEX-100K和WMT16多模式机器转换。但是,更难的数据集(例如,包含更长的序列),WMT14机器翻译的改进变得边际。我们使用K-Step Look-pard进行的进一步调查表明,更艰巨的任务遭受了高估的EOS(句子结束)概率。我们认为,高估的EOS概率还会导致梁搜索的性能降低,当时梁的宽度宽度增加。我们通过将辅助EOS损失整合到培训中来估计该模型是否应发出EOS或其他单词来解决EOS问题。我们的实验表明,提高EOS估计不仅会提高我们提出的外观模块的性能,还可以提高梁搜索的稳健性。

We demonstrate how we can practically incorporate multi-step future information into a decoder of maximum likelihood sequence models. We propose a "k-step look-ahead" module to consider the likelihood information of a rollout up to k steps. Unlike other approaches that need to train another value network to evaluate the rollouts, we can directly apply this look-ahead module to improve the decoding of any sequence model trained in a maximum likelihood framework. We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation. Our look-ahead module improves the performance of the simpler datasets such as IM2LATEX-100k and WMT16 multimodal machine translation. However, the improvement of the more difficult dataset (e.g., containing longer sequences), WMT14 machine translation, becomes marginal. Our further investigation using the k-step look-ahead suggests that the more difficult tasks suffer from the overestimated EOS (end-of-sentence) probability. We argue that the overestimated EOS probability also causes the decreased performance of beam search when increasing its beam width. We tackle the EOS problem by integrating an auxiliary EOS loss into the training to estimate if the model should emit EOS or other words. Our experiments show that improving EOS estimation not only increases the performance of our proposed look-ahead module but also the robustness of the beam search.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源