电话功能改善语音翻译

论文标题

电话功能改善语音翻译

Phone Features Improve Speech Translation

论文作者

Salesky, Elizabeth, Black, Alan W

论文摘要

语音翻译（ST）的端到端模型比传统的单独的ASR和MT模型的级联反应更紧密地逐渐融入语音识别（ASR）和机器翻译（MT），具有更简单的模型体系结构，并具有减少误差传播的潜力。他们的性能通常被认为是优越的，尽管在许多情况下尚未如此。我们比较了高层，低资源条件下的级联模型和端到端模型，并表明级联反应仍然更强。此外，我们介绍了两种将手机功能纳入ST模型的方法。我们表明，这些功能可以改善两种体系结构，缩小端到端模型和级联之间的差距，并优于先前的学术工作 - 在我们的低资源环境中最多可达9个BLEU。

End-to-end models for speech translation (ST) more tightly couple speech recognition (ASR) and machine translation (MT) than a traditional cascade of separate ASR and MT models, with simpler model architectures and the potential for reduced error propagation. Their performance is often assumed to be superior, though in many conditions this is not yet the case. We compare cascaded and end-to-end models across high, medium, and low-resource conditions, and show that cascades remain stronger baselines. Further, we introduce two methods to incorporate phone features into ST models. We show that these features improve both architectures, closing the gap between end-to-end models and cascades, and outperforming previous academic work -- by up to 9 BLEU on our low-resource setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题