盐：显着长尾翻译错误检测的框架

论文标题

盐：显着长尾翻译错误检测的框架

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

论文作者

Raunak, Vikas, Post, Matt, Menezes, Arul

论文摘要

传统的机器翻译（MT）指标提供了平均衡量翻译质量的度量，这对MT中行为问题的长尾巴不敏感。示例包括数字，物理单位，掉落的内容和幻觉的翻译。这些错误很少发生在神经机器翻译（NMT）中，极大地破坏了最先进的MT系统的可靠性。因此，重要的是要在模型开发过程中了解这些问题。朝这个方向介绍了Salted，这是MT模型的行为测试的基于规格的框架，该框架提供了明显的长尾错误的精细视图，从而使可信赖的可视性可视化对以前的不可见问题。我们方法的核心是开发高精度检测器，该检测器在源句子和系统输出之间标记错误（或验证输出正确性）。我们证明，此类探测器不仅可以用来识别MT系统中的显着长尾错误，还可以用于对训练数据进行更高的重新筛选，并通过NMT中的模型微调来解决目标误差，并生成新颖的数据，以进行跨度测试以在模型中引起更多错误。

Traditional machine translation (MT) metrics provide an average measure of translation quality that is insensitive to the long tail of behavioral problems in MT. Examples include translation of numbers, physical units, dropped content and hallucinations. These errors, which occur rarely and unpredictably in Neural Machine Translation (NMT), greatly undermine the reliability of state-of-the-art MT systems. Consequently, it is important to have visibility into these problems during model development. Towards this direction, we introduce SALTED, a specifications-based framework for behavioral testing of MT models that provides fine-grained views of salient long-tail errors, permitting trustworthy visibility into previously invisible problems. At the core of our approach is the development of high-precision detectors that flag errors (or alternatively, verify output correctness) between a source sentence and a system output. We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data, fixing targeted errors with model fine-tuning in NMT and generating novel data for metamorphic testing to elicit further bugs in models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题