使用文档级上下文在神经机器翻译中解决零资源域

论文标题

使用文档级上下文在神经机器翻译中解决零资源域

Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation

论文作者

Stojanovski, Dario, Fraser, Alexander

论文摘要

在没有培训数据的域中，在机器翻译中实现令人满意的性能。传统的监督域适应不适合解决此类零资源域，因为它依赖于内域并行数据。我们表明，当不可用的内域并行数据时，与仅访问单个句子相比，对文档级别上下文的访问可以更好地捕获域通用性。访问更多信息可提供更可靠的域估计。我们提出了两个文档级变压器模型，它们能够使用较大的上下文大小，并将这些模型与强型变压器基线进行比较。我们获得了我们研究的两个零资源域的改进。我们还提供了一个分析，在该分析中我们可以改变上下文的数量，并查看可用内域数据的情况。

Achieving satisfying performance in machine translation on domains for which there is no training data is challenging. Traditional supervised domain adaptation is not suitable for addressing such zero-resource domains because it relies on in-domain parallel data. We show that when in-domain parallel data is not available, access to document-level context enables better capturing of domain generalities compared to only having access to a single sentence. Having access to more information provides a more reliable domain estimation. We present two document-level Transformer models which are capable of using large context sizes and we compare these models against strong Transformer baselines. We obtain improvements for the two zero resource domains we study. We additionally provide an analysis where we vary the amount of context and look at the case where in-domain data is available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题