联合域适应ASR，并具有完整的自我诉讼

论文标题

联合域适应ASR，并具有完整的自我诉讼

Federated Domain Adaptation for ASR with Full Self-Supervision

论文作者

Jia, Junteng, Mahadeokar, Jay, Zheng, Weiyi, Shangguan, Yuan, Kalinli, Ozlem, Seide, Frank

论文摘要

联合学习（FL）通过协作对用户设备进行协作训练模型来保护用户隐私，因此消除了收集，存储和手动标记用户数据的需求。尽管在文献中已经对FL培训算法，非IID和差异隐私等重要主题进行了很好的研究，但本文重点介绍了改善对现实的ASR的实践重要性的两个挑战：缺乏地面真实的转录和计算资源和网络在Edge设备上的稀缺性。首先，我们提出了一个使用完整的自我安排的FL系统，用于对设备的ASR域进行适应，该系统将自标记与数据增强和过滤技术一起使用。该系统可以使用没有任何地面真相转录的内域音频来改善基于室外数据的强大Emform-transducer-transducer模型。其次，为了降低训练成本，我们提出了一个自限制的RNN换能器（SR-RNN-T）损失，这是一种使用自我超越的Viterbi对准的对齐限制RNN-T的变体。为了进一步降低计算和网络成本，我们系统地探索了Emformer-Transducer中的一部分重量。我们最好的培训食谱比强大的外域基线相对减少了$ 12.9 \％$，这等于可以通过全面的人类监督和集中培训来实现的减少$ 70 \％$。

Cross-device federated learning (FL) protects user privacy by collaboratively training a model on user devices, therefore eliminating the need for collecting, storing, and manually labeling user data. While important topics such as the FL training algorithm, non-IID-ness, and Differential Privacy have been well studied in the literature, this paper focuses on two challenges of practical importance for improving on-device ASR: the lack of ground-truth transcriptions and the scarcity of compute resource and network bandwidth on edge devices. First, we propose a FL system for on-device ASR domain adaptation with full self-supervision, which uses self-labeling together with data augmentation and filtering techniques. The system can improve a strong Emformer-Transducer based ASR model pretrained on out-of-domain data, using in-domain audio without any ground-truth transcriptions. Second, to reduce the training cost, we propose a self-restricted RNN Transducer (SR-RNN-T) loss, a variant of alignment-restricted RNN-T that uses Viterbi alignments from self-supervision. To further reduce the compute and network cost, we systematically explore adapting only a subset of weights in the Emformer-Transducer. Our best training recipe achieves a $12.9\%$ relative WER reduction over the strong out-of-domain baseline, which equals $70\%$ of the reduction achievable with full human supervision and centralized training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题