论文标题
几乎没有音乐源分离
Few-Shot Musical Source Separation
论文作者
论文摘要
基于深度学习的音乐源分离方法通常仅限于对模型进行培训的仪器类别,并且不概括地分开看不见的乐器。为了解决这个问题,我们提出了一些音乐源分离范式。我们使用目标仪器的音频示例来调节通用的U-NET源分离模型。我们与U-NET共同训练一些射击调节器编码器,以将音频示例编码为条件矢量,以通过特征线性调制(膜)配置U-NET。我们在MUSDB18和MedleyDB数据集中评估了经过训练的模型。我们表明,我们提出的几个调节范式的表现优于基线单式仪表级条件模型,用于可见和看不见的仪器。为了将我们的方法的范围扩展到更广泛的现实情况,我们还尝试了不同的条件示例特征,包括来自不同记录的示例,具有多个来源或负面条件的示例。
Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning encoder jointly with the U-Net to encode the audio examples into a conditioning vector to configure the U-Net via feature-wise linear modulation (FiLM). We evaluate the trained models on real musical recordings in the MUSDB18 and MedleyDB datasets. We show that our proposed few-shot conditioning paradigm outperforms the baseline one-hot instrument-class conditioned model for both seen and unseen instruments. To extend the scope of our approach to a wider variety of real-world scenarios, we also experiment with different conditioning example characteristics, including examples from different recordings, with multiple sources, or negative conditioning examples.