一项关于端到端语音识别的教师合奏学习私人聚合的实验研究

论文标题

一项关于端到端语音识别的教师合奏学习私人聚合的实验研究

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

论文作者

Yang, Chao-Han Huck, Chen, I-Fan, Stolcke, Andreas, Siniscalchi, Sabato Marco, Lee, Chin-Hui

论文摘要

差异隐私（DP）是一种数据保护途径，可通过对隐私数据施加嘈杂的扭曲来保护用于培训深层模型的用户信息。这样的噪音扰动通常会导致自动语音识别（ASR）的严重性能下降，以满足隐私预算$ \ varepsilon $。教师合奏（PATE）的私人聚合利用集合概率在处理由$ \ varepsilon $的小值控制的噪声效应时提高ASR准确性。我们将学习范围扩展到动态模式，即语音话语，并进行第一次实验证明，以防止ASR训练中的声学数据泄漏。我们在开源的Librispeech和Timit Corpora上评估了三种端到端的深层模型，包括LAS，Hybrid CTC/Guate和RNN传感器。 PATE学习增强的ASR模型优于基准DP-SGD机制，尤其是在严格的DP预算下，对于通过LibrisPeech评估的RNN传感器模型，相对单词错误率降低了26.2％至27.5％。我们还引入了DP保护的ASR解决方案，以便在公共言论语料库中进行预处理。

Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data. Such a noise perturbation often results in a severe performance degradation in automatic speech recognition (ASR) in order to meet a privacy budget $\varepsilon$. Private aggregation of teacher ensemble (PATE) utilizes ensemble probabilities to improve ASR accuracy when dealing with the noise effects controlled by small values of $\varepsilon$. We extend PATE learning to work with dynamic patterns, namely speech utterances, and perform a first experimental demonstration that it prevents acoustic data leakage in ASR training. We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora. PATE learning-enhanced ASR models outperform the benchmark DP-SGD mechanisms, especially under strict DP budgets, giving relative word error rate reductions between 26.2% and 27.5% for an RNN transducer model evaluated with LibriSpeech. We also introduce a DP-preserving ASR solution for pretraining on public speech corpora.

下载PDF全文

下载文献需遵守相关版权规定

论文标题