论文标题

比较人和机器翻译中的公式化语言:来自议会的见解

Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus

论文作者

Bestgen, Yves

论文摘要

最近的一项研究表明,与人类翻译相比,神经机器翻译包含由相对高频单词制成的更强相关的公式化序列,但与相对较少的单词制成的公式性序列相对较少。这些结果是基于质量报纸文章的翻译而获得的,其中人类翻译被认为不是很字面的。本研究试图使用议会语料库复制这项研究。该文本是由三种著名的神经机器翻译系统从法语翻译成英文的:DeepL,Google Translate和Microsoft Translator。结果证实了对新闻语料库的观察结果,但差异却不那么强。他们认为,在比较人类和机器翻译时,最好使用通常会导致更多字面翻译的文本流派,例如议会语料库。关于三个神经机系统之间的差异,似乎Google翻译包含的高度搭建大型巨型较少,与collgram技术相比,与DEEPL和MICROSOFT翻译相比。

A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words. These results were obtained on the basis of translations of quality newspaper articles in which human translations can be thought to be not very literal. The present study attempts to replicate this research using a parliamentary corpus. The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator. The results confirm the observations on the news corpus, but the differences are less strong. They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations. Regarding the differences between the three neural machine systems, it appears that Google translations contain fewer highly collocational bigrams, identified by the CollGram technique, than Deepl and Microsoft translations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源