论文标题

启用:现实世界机器翻译的专家建议积极学习

Onception: Active Learning with Expert Advice for Real World Machine Translation

论文作者

Mendonça, Vânia, Rei, Ricardo, Coheur, Luisa, Sardinha, Alberto

论文摘要

通过选择哪些实例更值得注释,积极学习可以在低资源设置(即稀缺带注释的数据)中发挥重要作用。机器翻译的大多数活跃学习方法都假定存在源语言的句子库,并依靠人类注释来提供翻译或后编辑,这仍然是昂贵的。在本文中,我们假设一个现实世界中的人类场景:(i)源句子可能不容易获得,而是到达流中; (ii)自动翻译以评级的形式接收反馈,而不是正确/编辑的翻译,因为人类在循环中可能是寻找翻译的用户,但无法提供。为了应对确定每个传入的对源译本是否值得询问人类反馈的挑战,我们诉诸基于流的积极学习查询策略。此外,由于我们不知道哪种查询策略最适合某种语言对和一组机器翻译模型,因此我们建议使用预测与专家建议动态组合多个策略。我们的实验表明,使用主动学习允许与人类相互作用更少的最佳机器翻译系统收敛。此外,使用预测与专家建议相结合的多种策略通常优于几个单独的积极学习策略,而互动却更少。

Active learning can play an important role in low-resource settings (i.e., where annotated data is scarce), by selecting which instances may be more worthy to annotate. Most active learning approaches for Machine Translation assume the existence of a pool of sentences in a source language, and rely on human annotators to provide translations or post-edits, which can still be costly. In this article, we assume a real world human-in-the-loop scenario in which: (i) the source sentences may not be readily available, but instead arrive in a stream; (ii) the automatic translations receive feedback in the form of a rating, instead of a correct/edited translation, since the human-in-the-loop might be a user looking for a translation, but not be able to provide one. To tackle the challenge of deciding whether each incoming pair source-translations is worthy to query for human feedback, we resort to a number of stream-based active learning query strategies. Moreover, since we not know in advance which query strategy will be the most adequate for a certain language pair and set of Machine Translation models, we propose to dynamically combine multiple strategies using prediction with expert advice. Our experiments show that using active learning allows to converge to the best Machine Translation systems with fewer human interactions. Furthermore, combining multiple strategies using prediction with expert advice often outperforms several individual active learning strategies with even fewer interactions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源