论文标题

未分段输入的同时翻译:滑动窗口方法

Simultaneous Translation for Unsegmented Input: A Sliding Window Approach

论文作者

Sen, Sukanta, Bojar, Ondřej, Haddow, Barry

论文摘要

在级联的口语翻译方法(SLT)中,ASR输出通常在传递给MT之前对句子进行定位,并将其分割成句子,因为后者通常对书面文本进行了培训。但是,由于ASR系统的句子最终标点符号差,错误的细分导致翻译质量的退化,尤其是在连续更新输入的同时(在线)设置中。为了减少自动细分的影响,我们提出了一种滑动窗口方法,用于翻译原始ASR输出(在线或离线),而无需依靠自动分段。我们使用并行窗口(而不是并行句子)训练翻译模型,从原始培训数据中提取。在测试时,我们在窗口级别进行翻译,并使用简单的方法加入翻译的窗口,以生成最终的翻译。对英语至德语的实验表明,与基于基线的在线SLT系统相比,我们的方法在通常的ASR分段管道上提高了1.3--2.0 BLEU点,固定长度大大降低了闪烁。

In the cascaded approach to spoken language translation (SLT), the ASR output is typically punctuated and segmented into sentences before being passed to MT, since the latter is typically trained on written text. However, erroneous segmentation, due to poor sentence-final punctuation by the ASR system, leads to degradation in translation quality, especially in the simultaneous (online) setting where the input is continuously updated. To reduce the influence of automatic segmentation, we present a sliding window approach to translate raw ASR outputs (online or offline) without needing to rely on an automatic segmenter. We train translation models using parallel windows (instead of parallel sentences) extracted from the original training data. At test time, we translate at the window level and join the translated windows using a simple approach to generate the final translation. Experiments on English-to-German and English-to-Czech show that our approach improves 1.3--2.0 BLEU points over the usual ASR-segmenter pipeline, and the fixed-length window considerably reduces flicker compared to a baseline retranslation-based online SLT system.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源