任意声音的单一有条件音频过滤

论文标题

任意声音的单一有条件音频过滤

One-shot conditional audio filtering of arbitrary sounds

论文作者

Gfeller, Beat, Roblek, Dominik, Tagliasacchi, Marco

论文摘要

我们认为仅基于目标源的一个简短样本将特定的声源与单通道混合物分开的问题。使用Soundfilter，波浪之间的神经网络体系结构，我们可以在不使用任何声音类标签的情况下训练模型。使用与源分离网络共同学习的调理编码模型，可以将训练的模型“配置”到过滤任意声源，甚至在训练过程中尚未看到的源。在FSD50K数据集上进行了评估，我们的模型可获得9.6 dB的SI-SDR改进，用于两种声音的混合物。当在LibrisPeech上接受培训时，我们的模型在将一个声音与两个扬声器的混合物分开时，可以实现14.0 dB的SI-SDR改进。此外，我们表明，通过调节编码器群集在嵌入空间中听到相似的声音所学的表示，即使它是在不使用任何标签的情况下训练的。

We consider the problem of separating a particular sound source from a single-channel mixture, based on only a short sample of the target source. Using SoundFilter, a wave-to-wave neural network architecture, we can train a model without using any sound class labels. Using a conditioning encoder model which is learned jointly with the source separation network, the trained model can be "configured" to filter arbitrary sound sources, even ones that it has not seen during training. Evaluated on the FSD50k dataset, our model obtains an SI-SDR improvement of 9.6 dB for mixtures of two sounds. When trained on Librispeech, our model achieves an SI-SDR improvement of 14.0 dB when separating one voice from a mixture of two speakers. Moreover, we show that the representation learned by the conditioning encoder clusters acoustically similar sounds together in the embedding space, even though it is trained without using any labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题