伪标签从框架级别转移到笔记级的教师学生框架，用于唱歌的唱片

论文标题

伪标签从框架级别转移到笔记级的教师学生框架，用于唱歌的唱片

Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music

论文作者

Kum, Sangeun, Lee, Jongpil, Kim, Keunhyoung Luke, Kim, Taehyoung, Nam, Juhan

论文摘要

缺乏大规模笔记级标签数据是多音音乐唱歌转录的主要障碍。我们通过使用未标记数据的人声音调估计模型中的伪标签来解决该问题。提出的方法首先将帧级伪标签转换为通过音高和节奏量化步骤的注释级。然后，它通过在教师学生框架中进行自我训练进一步提高了标签质量。为了验证该方法，我们通过研究两个声音音高估计模型作为伪标签发生器，两个教师学生框架的设置以及自训练中的迭代次数来进行各种实验设置。结果表明，所提出的方法可以有效利用大规模的未标记音频数据，并通过嘈杂的学生模型进行自我训练有助于提高性能。最后，我们表明，仅使用未标记数据训练的模型与以前的作品具有可比性的性能，并且与仅使用标记数据训练的模型相比，使用其他标记数据训练的模型可实现更高的精度。

Lack of large-scale note-level labeled data is the major obstacle to singing transcription from polyphonic music. We address the issue by using pseudo labels from vocal pitch estimation models given unlabeled data. The proposed method first converts the frame-level pseudo labels to note-level through pitch and rhythm quantization steps. Then, it further improves the label quality through self-training in a teacher-student framework. To validate the method, we conduct various experiment settings by investigating two vocal pitch estimation models as pseudo-label generators, two setups of teacher-student frameworks, and the number of iterations in self-training. The results show that the proposed method can effectively leverage large-scale unlabeled audio data and self-training with the noisy student model helps to improve performance. Finally, we show that the model trained with only unlabeled data has comparable performance to previous works and the model trained with additional labeled data achieves higher accuracy than the model trained with only labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题