论文标题
对您的损失,我很抱歉:基于频谱的音频距离很差
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch
论文作者
论文摘要
不断增长的研究表明,合成失败模式意味着泛化差。我们比较了合成基准上常用的音频到原声损失,从而测量了两个固定正弦曲线之间的音高距离。结果令人惊讶:许多球场方向的感觉很差。这些缺点是使用简单的等级假设暴露的。我们的任务对于人类来说是微不足道的,但对于这些音频距离很难,这表明可以通过改善当前损失来在自我监督的音频学习中取得重大进展。
Growing research demonstrates that synthetic failure modes imply poor generalization. We compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids. The results are surprising: many have poor sense of pitch direction. These shortcomings are exposed using simple rank assumptions. Our task is trivial for humans but difficult for these audio distances, suggesting significant progress can be made in self-supervised audio learning by improving current losses.