多语言低资源语音识别的对抗性元抽样

论文标题

多语言低资源语音识别的对抗性元抽样

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

论文作者

Xiao, Yubei, Gong, Ke, Zhou, Pan, Zheng, Guolin, Liang, Xiaodan, Lin, Liang

论文摘要

低资源自动语音识别（ASR）具有挑战性，因为低资源目标语言数据不能很好地训练ASR模型。为了解决此问题，元学习将每种源语言的ASR制定为许多小型ASR任务和Meta-Learns，对来自不同源语言的所有任务的模型初始化，以访问未看到目标语言的快速适应。但是，对于不同的源语言，由于其不同的数据量表和多样的语音系统，其数量和困难差异很大，这导致了任务量化和任务缺陷不平衡问题，因此多语言元学习ASR（MML-ASR）失败。在这项工作中，我们通过开发一种新型的对抗元抽样（AMS）来改善MML-ASR来解决这个问题。在MML-ASR中采样任务时，AMS会自适应地确定每种源语言的任务采样概率。具体而言，对于每种源语言，如果查询损失很大，则意味着它的任务不能很好地对ASR模型进行数量和难度来训练ASR模型，因此应更频繁地采样以进行额外的学习。受到这一事实的启发，我们将所有源语言域的历史任务查询丢失归入一个网络中，以学习一项任务采样策略，以对抗，以增加MML-ASR的当前查询损失。因此，学习的任务抽样策略可以掌握每种语言的学习情况，从而预测每种语言的良好任务采样概率，以实现更有效的学习。最后，在将我们的AMS应用于MML-ASR上时，两个多语言数据集的实验结果显示出显着的性能改善，并且还证明了AMS对其他低资源语音任务和转移学习ASR方法的适用性。

Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different source languages, the quantity and difficulty vary greatly because of their different data scales and diverse phonological systems, which leads to task-quantity and task-difficulty imbalance issues and thus a failure of multilingual meta-learning ASR (MML-ASR). In this work, we solve this problem by developing a novel adversarial meta sampling (AMS) approach to improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language. Specifically, for each source language, if the query loss is large, it means that its tasks are not well sampled to train ASR model in terms of its quantity and difficulty and thus should be sampled more frequently for extra learning. Inspired by this fact, we feed the historical task query loss of all source language domain into a network to learn a task sampling policy for adversarially increasing the current query loss of MML-ASR. Thus, the learnt task sampling policy can master the learning situation of each language and thus predicts good task sampling probability for each language for more effective learning. Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题