Taylorbeamformer：从泰勒的近似理论增强多通道语音的全神经波束形式

论文标题

Taylorbeamformer：从泰勒的近似理论增强多通道语音的全神经波束形式

TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

论文作者

Li, Andong, Yu, Guochen, Zheng, Chengshi, Li, Xiaodong

论文摘要

尽管现有的端到端波束形式在各种前端语音处理任务中取得了令人印象深刻的性能，但它们通常将整个过程封装成黑匣子，因此缺乏足够的解释性。为了填补空白，我们提出了一种新型的神经光束器，灵感来自泰勒的近似理论，称为泰勒比安姆形式，用于增强多通道语音。核心思想是，可以将恢复过程作为输入混合物附近的空间滤波配制。基于此，我们将其分解为0阶非衍生和高阶衍生物术语的叠加，其中前者用作空间滤波器，后者被视为残留的噪声canceller，以进一步提高语音质量。为了实现端到端培训，我们用可训练的网络替换衍生作业，因此可以从培训数据中学习。基于LibrisPeech的合成数据集进行了广泛的实验，结果表明，所提出的方法对先前的高级基线表现出色。

While existing end-to-end beamformers achieve impressive performance in various front-end speech processing tasks, they usually encapsulate the whole process into a black box and thus lack adequate interpretability. As an attempt to fill the blank, we propose a novel neural beamformer inspired by Taylor's approximation theory called TaylorBeamformer for multi-channel speech enhancement. The core idea is that the recovery process can be formulated as the spatial filtering in the neighborhood of the input mixture. Based on that, we decompose it into the superimposition of the 0th-order non-derivative and high-order derivative terms, where the former serves as the spatial filter and the latter is viewed as the residual noise canceller to further improve the speech quality. To enable end-to-end training, we replace the derivative operations with trainable networks and thus can learn from training data. Extensive experiments are conducted on the synthesized dataset based on LibriSpeech and results show that the proposed approach performs favorably against the previous advanced baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题