通过深度学习方法从复音音乐中提取旋律：评论

论文标题

通过深度学习方法从复音音乐中提取旋律：评论

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

论文作者

M, Gurunath Reddy, Rao, K. Sreenivasa, Das, Partha Pratim

论文摘要

旋律提取是音乐研究人员中至关重要的音乐信息检索任务，因为其在教育教育学和音乐行业中的潜在应用。由于背景仪器的存在，旋律提取是一项挑战性的任务。同样，通常旋律的来源表现出与其他仪器的特征相似的特征。对人声的干扰背景伴奏使从混合信号中提取旋律更具挑战性。直到最近，在旋律提取研究人员中，基于经典信号处理的旋律提取方法非常流行。深度学习模型对大规模数据建模的能力以及模型通过利用空间和时间依赖性来学习自动特征的能力，启发了许多研究人员采用深度学习模型进行旋律提取。在本文中，已经尝试回顾最新的数据驱动的深度学习方法，从而从多音音乐中提取旋律。可用的深层模型已根据所使用的神经网络的类型及其用于预测旋律的输出表示形式进行了分类。此外，简要介绍了25种旋律提取模型的体系结构。用于优化旋律提取模型的模型参数的损耗函数将大致分为四类，并简要描述各种旋律提取模型使用的损失函数。同样，旋律提取模型和参数设置采用的各种输入表示也得到了深入的描述。包括一个描述Block-box旋律提取深神经网络的解释性的部分。比较了25种旋律提取方法的性能。本文还提供了探索/改善旋律提取方法的未来可能的方向。

Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题