论文标题
设计业务对话语料库
Designing the Business Conversation Corpus
论文作者
论文摘要
尽管由于平行语料库和基于语料库的培训技术的可用性日益增加,但在过去几年中,书面文本的转换进度已经遥不可及,但即使对于现代系统,口语文本和对话的自动翻译也仍然具有挑战性。在本文中,我们旨在通过引入新建的日语 - 英语对话并行语料库来提高对话文本的机器翻译质量。提供了对语料库的详细分析,以及具有自动翻译的具有挑战性的示例。我们还尝试在机器翻译培训方案中添加语料库,并显示所得系统如何从其使用中受益。
While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.