论文标题
音乐相似性计算单个乐器声音使用公制学习
Music Similarity Calculation of Individual Instrumental Sounds Using Metric Learning
论文作者
论文摘要
测量音乐相似性的标准对于开发灵活的音乐推荐系统很重要。已经提出了一些数据驱动的方法来仅使用音乐信号来计算音乐相似性,例如使用每个音乐作品上的标签信息基于三胞胎损失的公制学习。但是,由此产生的音乐相似性指标通常会捕获整个音乐,即各种乐器声音来源的混合,从而限制了音乐推荐系统的能力,例如,很难搜索包含类似鼓声的音乐作品。为了开发更灵活的音乐推荐系统,我们提出了一种音乐相似性计算方法,该方法着重于音乐作品中的单个乐器声源。通过为我们提出的方法充分利用数据驱动方法的潜力,我们在不使用任何标签信息的情况下使用弱监督的度量学习来对单个仪器声源信号进行,其中三胞胎中的正和负样本是由它们是否来自同一音乐作品来定义的。此外,假设每种仪器声源在实践中并不总是可用,我们还研究了使用仪器声源分离以在提议的方法中获取每个源的效果。实验结果表明,(1)可以为单个仪器的声音来源学习独特的相似性指标,(2)使用某些仪器的声音源学习的相似性可以与使用整个音乐作品所学到的相似性获得更准确的结果,(3)通过分离的乐器声音学习时的性能,并通过(4)通过(4)相似度量通过人类的方法进行了对对应的方法,以对待人类的方法,从而使人们对所构成的相似性进行了对对象的结果,以相对的结果,从而使得与之相关的结果,以相对的方式获得了对构成的结果。
The criteria for measuring music similarity are important for developing a flexible music recommendation system. Some data-driven methods have been proposed to calculate music similarity from only music signals, such as metric learning based on a triplet loss using tag information on each musical piece. However, the resulting music similarity metric usually captures the entire piece of music, i.e., the mixing of various instrumental sound sources, limiting the capability of the music recommendation system, e.g., it is difficult to search for a musical piece containing similar drum sounds. Towards the development of a more flexible music recommendation system, we propose a music similarity calculation method that focuses on individual instrumental sound sources in a musical piece. By fully exploiting the potential of data-driven methods for our proposed method, we employ weakly supervised metric learning to individual instrumental sound source signals without using any tag information, where positive and negative samples in a triplet loss are defined by whether or not they are from the same musical piece. Furthermore, assuming that each instrumental sound source is not always available in practice, we also investigate the effects of using instrumental sound source separation to obtain each source in the proposed method. Experimental results have shown that (1) unique similarity metrics can be learned for individual instrumental sound sources, (2) similarity metrics learned using some instrumental sound sources are possible to lead to more accurate results than that learned using the entire musical piece, (3) the performance degraded when learning with the separated instrumental sounds, and (4) similarity metrics learned by the proposed method well produced results that correspond to perception by human senses.