论文标题
在WMT 2019上,重新评估人类平价和超人性能的主张
Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019
论文作者
论文摘要
我们在WMT 2019的新闻共享任务上对三个翻译指示的新闻共享任务进行了重新评估人类平价和超人表演的主张:英语至德语,英语到俄罗斯和德国对英语。首先,我们在人类对该共享任务的评估中确定了三个潜在问题:(i)有限的条件相互作用可用的上下文,(ii)评估者的有限翻译能力和(iii)使用参考翻译。然后,我们考虑了这些问题的改进评估。我们的结果表明,除了对英语对德国人的人类平价的要求外,应驳斥在WMT 2019上提出的所有人类平价和超人表现的主张。根据我们的发现,我们提出了一系列建议和开放问题,以供将来评估机器翻译中的人类平价。
We reassess the claims of human parity and super-human performance made at the news shared task of WMT 2019 for three translation directions: English-to-German, English-to-Russian and German-to-English. First we identify three potential issues in the human evaluation of that shared task: (i) the limited amount of intersentential context available, (ii) the limited translation proficiency of the evaluators and (iii) the use of a reference translation. We then conduct a modified evaluation taking these issues into account. Our results indicate that all the claims of human parity and super-human performance made at WMT 2019 should be refuted, except the claim of human parity for English-to-German. Based on our findings, we put forward a set of recommendations and open questions for future assessments of human parity in machine translation.