最先进的机器翻译的细粒度语言评估

论文标题

最先进的机器翻译的细粒度语言评估

Fine-grained linguistic evaluation for state-of-the-art Machine Translation

论文作者

Avramidis, Eleftherios, Macketanz, Vivien, Strohriegel, Ursula, Burchardt, Aljoscha, Möller, Sebastian

论文摘要

本文介绍了一项测试套件提交，为第五次机器翻译会议（WMT20）的最新德语 - 英语系统提供了语言表现的详细统计。该分析涵盖了基于约5500个测试项目的14个类别组织的107个现象，包括45人小时的手动注释工作。尽管WMT20的最佳系统并不比宏观平均水平的WMT19的最佳系统明显好，但两个系统（Tohoku和Huoshan）的测试套件准确性似乎明显优于测试套件的准确性。此外，我们确定了一些语言现象，在这些现象中，所有系统都受到影响（例如成语，结果谓词和pluperfect），但我们还能够识别单个系统（例如引号，词汇歧义和sluicing和sluicing）的特定弱点。 WMT19的大多数系统都提交了今年的新版本，都表现出改进。

This paper describes a test suite submission providing detailed statistics of linguistic performance for the state-of-the-art German-English systems of the Fifth Conference of Machine Translation (WMT20). The analysis covers 107 phenomena organized in 14 categories based on about 5,500 test items, including a manual annotation effort of 45 person hours. Two systems (Tohoku and Huoshan) appear to have significantly better test suite accuracy than the others, although the best system of WMT20 is not significantly better than the one from WMT19 in a macro-average. Additionally, we identify some linguistic phenomena where all systems suffer (such as idioms, resultative predicates and pluperfect), but we are also able to identify particular weaknesses for individual systems (such as quotation marks, lexical ambiguity and sluicing). Most of the systems of WMT19 which submitted new versions this year show improvements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题