论文标题
通过域的适应和重新排列在低资源NMT中控制形式:iWSLT2022的SLT-CDT-UOS
Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022
论文作者
论文摘要
本文介绍了SLT-CDT-UOS小组对IWSLT 2022评估活动的一部分语音控制的第一个特殊任务的提交。我们的努力分为两个方面:数据工程和改变目标函数以进行最佳假设选择。我们使用与语言无关的方法从提供的语料库中提取正式和非正式的句子对;使用英语作为枢轴语言,我们将正式注释传播给了任务中被视为零弹的语言;我们还通过假设重新排行的方法进一步改善了形式的控制。在英语到德语和英语对西班牙的测试集上,我们在受约束的设置中达到了0.935的平均准确性,在不受约束的设置中达到了.995。在对英语到俄罗斯和英语至美意大利语的零射击设置中,我们的平均精度为.590,在受限的设置中为.590,而无约束的设置为.659。
This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590 for constrained setting and .659 for unconstrained.