基于对比度学习的高特异性音频检索的神经音频指纹

论文标题

基于对比度学习的高特异性音频检索的神经音频指纹

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

论文作者

Chang, Sungkyun, Lee, Donmoon, Park, Jeongsoo, Lim, Hyungui, Lee, Kyogu, Ko, Karam, Han, Yoonchang

论文摘要

大多数现有的音频指纹系统都有局限性用于大规模的高度音频检索。在这项工作中，我们从一个简短的音频段生成了低维表示，并将此指纹与快速最大的内部产品搜索相结合。为此，我们提出了一个从细分级搜索目标中衍生的对比学习框架。培训中的每个更新都使用一批由一组伪标签，随机选择的原始样本及其增强复制品组成。这些副本可以通过应用少量偏移和各种类型的扭曲（例如背景噪声和房间/麦克风脉冲响应）来模拟对原始音频信号的降解效果。在传统音频指纹系统曾经失败的细分级搜索任务中，我们使用10倍较小存储的系统显示出令人鼓舞的结果。我们的代码和数据集可在\ url {https://mimbres.github.io/neural-audio-fp/}上获得。

Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas. These replicas can simulate the degrading effects on original audio signals by applying small time offsets and various types of distortions, such as background noise and room/microphone impulse responses. In the segment-level search task, where the conventional audio fingerprinting systems used to fail, our system using 10x smaller storage has shown promising results. Our code and dataset are available at \url{https://mimbres.github.io/neural-audio-fp/}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题