论文标题

语言自适应跨语性语音表示学习,稀疏共享子网络

Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks

论文作者

Lu, Yizhou, Huang, Mingkun, Qu, Xinghua, Wei, Pengfei, Ma, Zejun

论文摘要

无监督的跨语性语音表示学习(XLSR)最近通过利用多种语言的大量未标记的数据来表现出有希望的语音识别结果。但是,由于缺乏语言的建模能力,标准XLSR模型遭受语言干扰问题。在这项工作中,我们研究了XLSR模型上的语言自适应培训。更重要的是,我们提出了一种基于稀疏共享子网络的新型语言自适应预训练方法。它通过对每种语言的不重要参数进行修剪而无需任何手动设计的语言特定组成部分,从而为语言特定的建模提供了空间。修剪后,每种语言仅保持稀疏的子网络,而子网络则相互部分共享。下游多语言语音识别任务的实验结果表明,我们所提出的方法在高资源和低资源语言上都显着优于基线XLSR模型。此外,我们提出的方法始终优于其他适应方法,需要更少的参数。

Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More importantly, we propose a novel language adaptive pre-training approach based on sparse sharing sub-networks. It makes room for language specific modeling by pruning out unimportant parameters for each language, without requiring any manually designed language specific component. After pruning, each language only maintains a sparse sub-network, while the sub-networks are partially shared with each other. Experimental results on a downstream multilingual speech recognition task show that our proposed method significantly outperforms baseline XLSR models on both high resource and low resource languages. Besides, our proposed method consistently outperforms other adaptation methods and requires fewer parameters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源