阿里巴巴 - 翻译中国对WMT 2022指标共享任务的提交

论文标题

阿里巴巴 - 翻译中国对WMT 2022指标共享任务的提交

Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

论文作者

Wan, Yu, Bao, Keqin, Liu, Dayiheng, Yang, Baosong, Wong, Derek F., Chao, Lidia S., Lei, Wenqiang, Xie, Jun

论文摘要

在本报告中，我们将提交给WMT 2022指标共享任务。我们基于Unite（统一翻译评估）的核心思想构建系统，该核心构建了一个单一模型，该统一将仅源，仅参考和源参考的评估方案统一。具体而言，在模型预训练阶段，我们首先将伪标记的数据示例应用于连续预训练Unite。值得注意的是，为了减少培训和微调之间的差距，我们使用数据裁切和基于排名的分数归一化策略。在微调阶段，我们使用了过去几年WMT竞赛的直接评估（DA）和多维质量指标（MQM）数据。特别是，我们从具有不同预训练的语言模型骨架的模型中收集结果，并使用不同的结合策略来涉及翻译方向。

In this report, we present our submission to the WMT 2022 Metrics Shared Task. We build our system based on the core idea of UNITE (Unified Translation Evaluation), which unifies source-only, reference-only, and source-reference-combined evaluation scenarios into one single model. Specifically, during the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE. Notably, to reduce the gap between pre-training and fine-tuning, we use data cropping and a ranking-based score normalization strategy. During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Specially, we collect the results from models with different pre-trained language model backbones, and use different ensembling strategies for involved translation directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题