使用爆发传播的多模式语音增强

论文标题

使用爆发传播的多模式语音增强

Multimodal Speech Enhancement Using Burst Propagation

论文作者

Raza, Mohsin, Passos, Leandro A., Khubaib, Ahmed, Adeel, Ahsan

论文摘要

本文提出了Mburst，这是一种新型的多模式解决方案，用于视听语音增强，其中考虑了有关前额叶皮层和其他大脑区域的锥体细胞的最新神经系统发现。所谓的爆发传播以更加可行的方式实现了几个标准，以解决信用分配问题：通过反馈来指导可塑性的符号和幅度，通过反馈和反馈通过不同的重量连接，近似重量连接，近似反馈和进料的反馈和进料连接以及对反馈信号进行线性线性。 Mburst从这种能力中受益于学习嘈杂信号与视觉刺激之间的相关性，从而通过扩增相关信息和抑制噪声来归因于语音的含义。通过网格语料库和基于Chime3的数据集进行的实验表明，Mburst可以将类似的面膜重建重建至基于多模式的基线基线，同时证明了出色的能量效率管理，从而降低了最高\ textbf {$ 70 \％$}的神经元激发速率。这样的功能意味着更可持续的实现，适合助听器或任何其他类似的嵌入式系统。

This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题