跳过推理，通过平行RNN进行更有效的实时语音增强

论文标题

跳过推理，通过平行RNN进行更有效的实时语音增强

Inference skipping for more efficient real-time speech enhancement with parallel RNNs

论文作者

Le, Xiaohuai, Lei, Tong, Chen, Kai, Lu, Jing

论文摘要

基于深层的神经网络（DNN）的语音增强模型由于表现有前途的表现引起了广泛的关注。但是，由于其高计算成本，很难在实时应用程序中部署强大的DNN。典型的压缩方法（例如修剪和量化）不能充分利用数据特征。在本文中，我们将Skip-RNN策略介绍给具有平行RNN的语音增强模型。 RNNS的状态间歇性更新而不会中断输出掩码的更新，从而导致计算负载大大减少而没有明显的音频工件。为了更好地利用语音和噪声之间的差异，我们通过语音活动检测（VAD）指导进一步正规化跳过策略，从而节省了更多的计算负载。对高性能语音增强模型的实验，双路卷积复发网络（DPCRN），显示了我们策略比网络修剪或直接训练较小模型的策略的优越性。我们还验证了对其他两个竞争性语音增强模型的拟议策略的概括。

Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to their promising performance. However, it is difficult to deploy a powerful DNN in real-time applications because of its high computational cost. Typical compression methods such as pruning and quantization do not make good use of the data characteristics. In this paper, we introduce the Skip-RNN strategy into speech enhancement models with parallel RNNs. The states of the RNNs update intermittently without interrupting the update of the output mask, which leads to significant reduction of computational load without evident audio artifacts. To better leverage the difference between the voice and the noise, we further regularize the skipping strategy with voice activity detection (VAD) guidance, saving more computational load. Experiments on a high-performance speech enhancement model, dual-path convolutional recurrent network (DPCRN), show the superiority of our strategy over strategies like network pruning or directly training a smaller model. We also validate the generalization of the proposed strategy on two other competitive speech enhancement models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题