朝着零射击的代码转换语音识别

论文标题

朝着零射击的代码转换语音识别

Towards Zero-Shot Code-Switched Speech Recognition

论文作者

Yan, Brian, Wiesner, Matthew, Klejch, Ondrej, Jyothi, Preethi, Watanabe, Shinji

论文摘要

在这项工作中，我们试图在零射击设置下构建有效的代码转换（CS）自动语音识别系统（ASR），在该设置中，没有转录的CS语音数据可用于培训。以前提出的框架将双语任务条件分配到其组成部分单语言部分是有效利用单语言数据的有希望的起点。但是，这些方法要求单语模块执行语言分割。也就是说，每个单语模块都必须同时检测CS点并转录一种语言的语音段，同时忽略其他语言的语言，而不是一件琐碎的任务。我们建议通过允许它们用单语脚本（即音译）抄录所有语音段来简化每个单语模块。这种简单的修改将CS点检测的责任传递给了随后的双语模块，这些模块通过考虑多个单语言音译以及外部语言模型信息来确定最终输出。我们在端到端可区分的神经网络中应用这种基于音译的方法，并在普通话 - 英语的接缝测试集上证明了其对零摄影CS ASR的功效。

In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally factorize the bilingual task into its constituent monolingual parts are a promising starting point for leveraging monolingual data efficiently. However, these methods require the monolingual modules to perform language segmentation. That is, each monolingual module has to simultaneously detect CS points and transcribe speech segments of one language while ignoring those of other languages -- not a trivial task. We propose to simplify each monolingual module by allowing them to transcribe all speech segments indiscriminately with a monolingual script (i.e. transliteration). This simple modification passes the responsibility of CS point detection to subsequent bilingual modules which determine the final output by considering multiple monolingual transliterations along with external language model information. We apply this transliteration-based approach in an end-to-end differentiable neural network and demonstrate its efficacy for zero-shot CS ASR on Mandarin-English SEAME test sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题