HARMOF0：对数量表扩张卷积以进行音高估计

论文标题

HARMOF0：对数量表扩张卷积以进行音高估计

HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

论文作者

Wei, Weixing, Li, Peilin, Yu, Yi, Li, Wei

论文摘要

声音，尤其是音乐，包含散布在频率维度的各种谐波组件。正常的卷积神经网络很难观察这些泛音。本文引入了多个速率扩张的因果卷积（MRDC-CONV）方法，以有效地捕获对数尺度谱图中的谐波结构。谐波有助于音高估计，这对于许多声音处理应用非常重要。我们提出了一个完全卷积的网络Harmof0，以评估MRDC-CONV和其他扩张卷积。结果表明，该模型的表现优于DEEPF0，在三个数据集中产生最先进的性能，同时降低了90％以上的参数。我们还发现它具有更强的噪声性和更少的八度误差。代码和预培训模型可在https://github.com/wx-wei/harmof0上找到。

Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors. The code and pre-trained model are available at https://github.com/WX-Wei/HarmoF0.

下载PDF全文

下载文献需遵守相关版权规定

论文标题