论文标题
跨语言对机器翻译的一致评估
Consistent Human Evaluation of Machine Translation across Language Pairs
论文作者
论文摘要
鉴于人类评估者之间的差异很高,通过人类评估获得机器翻译系统的有意义的质量得分仍然是一个挑战,部分原因是对不同语言对的翻译质量的主观期望。我们提出了一个称为XSTS的新指标,该指标更加专注于语义等效性和一种跨语言校准方法,以实现更一致的评估。我们证明了这些新颖贡献在大规模评估研究中的有效性,最多14对语言对,均以英语翻译。
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.