论文标题

将术语限制纳入自动后编辑

Incorporating Terminology Constraints in Automatic Post-Editing

论文作者

Wan, David, Kedzie, Chris, Ladhak, Faisal, Carpuat, Marine, McKeown, Kathleen

论文摘要

机器翻译(MT)的用户可能需要确保使用特定的词汇术语。尽管存在在推断MT期间合并术语约束的技术,但当前的APE方法无法确保它们会出现在最终翻译中。在本文中,我们为词汇约束的APE提供了自回归和非自动回归模型,这表明我们的方法可以保留95%的术语,并提高了英语基准测试的翻译质量。即使应用于词汇约束的MT输出,我们的方法也能够改善术语的保存。但是,我们表明我们的模型不会学会系统地复制约束,并提出了一种简单的数据增强技术,从而改善了性能和鲁棒性。

Users of machine translation (MT) may want to ensure the use of specific lexical terminologies. While there exist techniques for incorporating terminology constraints during inference for MT, current APE approaches cannot ensure that they will appear in the final translation. In this paper, we present both autoregressive and non-autoregressive models for lexically constrained APE, demonstrating that our approach enables preservation of 95% of the terminologies and also improves translation quality on English-German benchmarks. Even when applied to lexically constrained MT output, our approach is able to improve preservation of the terminologies. However, we show that our models do not learn to copy constraints systematically and suggest a simple data augmentation technique that leads to improved performance and robustness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源