音乐信号中的自我监督的节奏跟踪以及多形对比度学习

论文标题

音乐信号中的自我监督的节奏跟踪以及多形对比度学习

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

论文作者

Desblancs, Dorian

论文摘要

注释音乐节奏是一个非常漫长而乏味的过程。为了解决这个问题，我们提出了一个新的自我监督的学习借口任务，以进行节拍跟踪和下调估计。此任务利用Spleeter（音频源分离模型）将歌曲的鼓与其余信号分开。第一组信号被用作阳性，并通过扩展负面因素进行对比度学习预训练。另一方面，无鼓信号用作锚。当使用此借口任务预先训练完全跨跨趋势和经常性模型时，就会学习一个发作功能。在某些情况下，发现此功能映射到歌曲中的周期性元素。我们发现，当节拍跟踪训练集非常小（少于10个示例）时，预先训练的模型优于随机初始化的模型。在情况并非如此，预训练会导致学习的速度，从而导致模型过于拟合训练集。更普遍地，这项工作定义了音乐自我监督学习领域的新观点。这是最早使用音频源分离作为自我划分的基本组成部分的作品之一。

Annotating musical beats is a very long and tedious process. In order to combat this problem, we present a new self-supervised learning pretext task for beat tracking and downbeat estimation. This task makes use of Spleeter, an audio source separation model, to separate a song's drums from the rest of its signal. The first set of signals are used as positives, and by extension negatives, for contrastive learning pre-training. The drum-less signals, on the other hand, are used as anchors. When pre-training a fully-convolutional and recurrent model using this pretext task, an onset function is learned. In some cases, this function is found to be mapped to periodic elements in a song. We find that pre-trained models outperform randomly initialized models when a beat tracking training set is extremely small (less than 10 examples). When this is not the case, pre-training leads to a learning speed-up that causes the model to overfit to the training set. More generally, this work defines new perspectives in the realm of musical self-supervised learning. It is notably one of the first works to use audio source separation as a fundamental component of self-supervision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题