论文标题
连续解码语音到文本翻译
Consecutive Decoding for Speech-to-text Translation
论文作者
论文摘要
直接将源语言语言转化为目标语言文本的语音到文本翻译(ST)最近引起了密集的关注。但是,单个模型中语音识别和机器翻译的组合在直接的跨模式跨语言映射上造成了沉重的负担。为了减少学习难度,我们提出了连续的转录和翻译(COSTT),这是语音到文本翻译的整体方法。关键思想是用单个解码器生成源成绩单和目标翻译文本。它使模型培训受益,从而可以充分利用其他大型平行文本语料库来增强语音翻译培训。我们的方法在三个主流数据集上进行了验证,包括增强的Liberspeech英语 - 法国数据集,IWSLT2018英语 - 吉曼数据集和TED English-Chinese数据集。实验表明,我们提出的COSTT优于表现或与三个数据集上的先前最先进的方法相提并论。我们已通过\ url {https://github.com/dqqcasia/st}发布了代码。
Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, IWSLT2018 English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms or on par with the previous state-of-the-art methods on the three datasets. We have released our code at \url{https://github.com/dqqcasia/st}.