论文标题
西班牙网络:空间信息的立体音乐源分离
SpaIn-Net: Spatially-Informed Stereophonic Music Source Separation
论文作者
论文摘要
随着使用深层神经网络的数据驱动方法的最新进展,音乐源分离已被称为特定于仪器的监督问题。尽管现有的深度学习模型隐含地吸收了多通道输入信号传达的空间信息,但我们认为,更明确和积极地使用空间信息不仅可以改善分离过程,而且还为许多基于用户的工具提供了入口点。为此,我们基于感兴趣源的立体位置引入了一种控制方法,该源表示为平移角度。我们提出了各种条件机制,包括使用原始角度及其派生的特征表示,并证明空间信息有帮助。我们提出的方法与位置不可知论体系结构相比,在基于蓝宝克的模拟实验中提高了分离性能。此外,所提出的方法允许在包含两个吉他轨道的混合物中解散同类仪器。最后,我们还证明了我们的方法对不正确的源平台信息具有鲁棒性,这可能是由于我们建议的用户互动而产生的。
With the recent advancements of data driven approaches using deep neural networks, music source separation has been formulated as an instrument-specific supervised problem. While existing deep learning models implicitly absorb the spatial information conveyed by the multi-channel input signals, we argue that a more explicit and active use of spatial information could not only improve the separation process but also provide an entry-point for many user-interaction based tools. To this end, we introduce a control method based on the stereophonic location of the sources of interest, expressed as the panning angle. We present various conditioning mechanisms, including the use of raw angle and its derived feature representations, and show that spatial information helps. Our proposed approaches improve the separation performance compared to location agnostic architectures by 1.8 dB SI-SDR in our Slakh-based simulated experiments. Furthermore, the proposed methods allow for the disentanglement of same-class instruments, for example, in mixtures containing two guitar tracks. Finally, we also demonstrate that our approach is robust to incorrect source panning information, which can be incurred by our proposed user interaction.