论文标题

控制有关对话参与者的文本外属性 - 英语对神经机器翻译的案例研究

Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation

论文作者

Vincent, Sebastian T., Barrault, Loïc, Scarton, Carolina

论文摘要

与英语不同,形态上丰富的语言可以通过代词,单词和语法的形态结尾来揭示说话者或其对话伴侣的特征,例如性别和数字。从英语翻译成这样的语言时,机器翻译模型需要选择对文本上下文的某些解释,如果不可用文本信息,这可能会导致严重的翻译错误。我们以英语对语言方向调查了这一挑战。我们专注于在电视对话的自动翻译中利用外部元数据的研究不足的问题,提出了一个案例研究,其中在多属性场景中采用了多种用于控制翻译中属性的方法。最佳模型的提高+5.81 CHRF ++/+6.03 BLEU,其他模型可实现竞争性能。我们还为波兰电视对话的新型属性注释数据集和用于评估模型中属性控制的形态分析脚本的新型数据集。

Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源