同时翻译的重新翻译与流式传输

论文标题

同时翻译的重新翻译与流式传输

Re-translation versus Streaming for Simultaneous Translation

论文作者

Arivazhagan, Naveen, Cherry, Colin, Macherey, Wolfgang, Foster, George

论文摘要

在改善流媒体翻译方面取得了长足进步，这是一种同时范式，随着更多的源内容的可用，系统将附加到不断增长的假设。我们研究了一个相关的问题，在该问题中，允许对严格附加单词以外的假设进行修订。这适用于诸如现场字幕的音频供稿之类的应用。在这种情况下，我们将自定义流媒体方法比较重新翻译，这是一种简单的策略，每个新的源代币都从头开始触发一个独特的翻译。我们发现重新翻译比最先进的流媒体系统一样好或更好，即使在允许几乎没有修订的约束下运行时，重新翻译也是如此。我们将大部分成功归因于先前提出的数据启发技术，该技术将前缀对添加到训练数据中，而训练数据与Wait-K推断同时，这形成了用于流式翻译的强大基线。我们还强调了重新翻译的能力，可以通过一个实验将任意强大的MT系统包裹起来，以显示从升级到其基本模型的大量改进。

There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find re-translation to be as good or better than state-of-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation's ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题