重新访问表示语音分离的sindhorn距离

论文标题

重新访问表示语音分离的sindhorn距离

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances

论文作者

Mimilakis, Stylianos Ioannis, Drossos, Konstantinos, Schuller, Gerald

论文摘要

在这项工作中，我们提出了一种无监督学习音频表示形式的方法，重点是演唱语音分离的任务。我们以先前提出的方法为基础，用于学习时间域音乐信号的表示形式，并重新参数化自动编码器，并通过使用熵正则化的sndhorn距离家族来扩展它。我们在专业制作的音乐录音的免费MUSDB18数据集上评估了我们的方法，我们的结果表明，具有较小熵正则化强度的sindhorn距离略有提高了知情的歌声分离的性能。通过增加熵正则化的强度，混合信号的学习表示由几乎完美的加性和明显的结构化来源组成。

In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation. We build upon a previously proposed method for learning representations of time-domain music signals with a re-parameterized denoising autoencoder, extending it by using the family of Sinkhorn distances with entropic regularization. We evaluate our method on the freely available MUSDB18 dataset of professionally produced music recordings, and our results show that Sinkhorn distances with small strength of entropic regularization are marginally improving the performance of informed singing voice separation. By increasing the strength of the entropic regularization, the learned representations of the mixture signal consists of almost perfectly additive and distinctly structured sources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题