Tinylstms：有效的助听器神经语音增强

论文标题

Tinylstms：有效的助听器神经语音增强

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

论文作者

Fedorov, Igor, Stamenovic, Marko, Jensen, Carl, Yang, Li-Chia, Mandell, Ari, Gan, Yiming, Mattina, Matthew, Whatmough, Paul N.

论文摘要

现代语音增强算法通过大型复发神经网络（RNN）实现了显着的噪声抑制。但是，大型RNN限制了助听器硬件（HW）形式的实际部署，它们是电池供电的，并在资源约束的微控制器单元（MCUS）上运行，内存能力有限和计算能力。在这项工作中，我们使用模型压缩技术来弥合这一差距。我们定义了HW对RNN施加的约束，并描述了一种满足它们的方法。尽管模型压缩技术是一个活跃的研究领域，但我们是第一个使用重量/激活的修剪和整数量化的RNN语音增强功效的人。我们还展示了状态更新跳过，从而减少了计算负载。最后，我们对压缩模型进行了感知评估，以验证人类评估者的音频质量。结果表明，对于压缩型号的基线，模型大小和操作分别为11.9 $ \ times $和2.9 $ \ times $，而在听力偏好方面没有统计学差异，仅显示0.55db sdr的损失。我们的模型达到了2.39ms的计算潜伏期，远大于10ms目标，比以前的工作更好。

Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs). However, large RNNs limit practical deployment in hearing aid hardware (HW) form-factors, which are battery powered and run on resource-constrained microcontroller units (MCUs) with limited memory capacity and compute capability. In this work, we use model compression techniques to bridge this gap. We define the constraints imposed on the RNN by the HW and describe a method to satisfy them. Although model compression techniques are an active area of research, we are the first to demonstrate their efficacy for RNN speech enhancement, using pruning and integer quantization of weights/activations. We also demonstrate state update skipping, which reduces the computational load. Finally, we conduct a perceptual evaluation of the compressed models to verify audio quality on human raters. Results show a reduction in model size and operations of 11.9$\times$ and 2.9$\times$, respectively, over the baseline for compressed models, without a statistical difference in listening preference and only exhibiting a loss of 0.55dB SDR. Our model achieves a computational latency of 2.39ms, well within the 10ms target and 351$\times$ better than previous work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题