M-Audapter：端到端语音到文本翻译的模态改编

论文标题

M-Audapter：端到端语音到文本翻译的模态改编

M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

论文作者

Zhao, Jinming, Yang, Hao, Shareghi, Ehsan, Haffari, Gholamreza

论文摘要

端到端的语音到文本翻译模型通常使用预训练的语音编码器和预训练的文本解码器初始化。这导致了预训练和微调之间的显着训练差距，这在很大程度上是由于语音输出从编码器和解码器的文本输入之间的方式差异。在这项工作中，我们旨在弥合语音和文本之间的方式差距，以提高翻译质量。我们提出了一种基于变压器的新型模块M-Adapter，以使语音表示形式适应文本。在缩小语音序列的同时，M-ADAPTER通过建模语音序列的全球和局部依赖性产生了对语音翻译所需的特征。我们的实验结果表明，我们的模型在必需的基线上胜过强大的基线，最多可在RESS-C EN $ \ rightarrow $ de DataSet上获得1个BLEU得分。

End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder. This leads to a significant training gap between pre-training and fine-tuning, largely due to the modality differences between speech outputs from the encoder and text inputs to the decoder. In this work, we aim to bridge the modality gap between speech and text to improve translation quality. We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text. While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation via modelling global and local dependencies of a speech sequence. Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU score on the Must-C En$\rightarrow$DE dataset.\footnote{Our code is available at https://github.com/mingzi151/w2v2-st.}

下载PDF全文

下载文献需遵守相关版权规定

论文标题