使用基于深度学习的暹罗结构带有成对存在矩阵的多标签声音事件检索

论文标题

使用基于深度学习的暹罗结构带有成对存在矩阵的多标签声音事件检索

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

论文作者

Fan, Jianyu, Nichols, Eric, Tompkins, Daniel, Mendez, Ana Elisa Mendez, Elizalde, Benjamin, Pasquier, Philippe

论文摘要

声景的现实记录通常会有多个声音事件共同发生，例如汽车角，发动机和人类声音。声音事件检索是一种基于内容的搜索，旨在查找音频样本，类似于基于声音或语义内容的音频查询。最先进的声音事件检索模型集中在单标签的录音上，其中只有一个声音事件，而不是在多标签的录音上（即，在一个录制中发生了多个声音事件）。为了解决后一个问题，我们提出了具有暹罗结构和成对存在矩阵的不同深度学习体系结构。使用包含单标签和多标签音景记录的Sonyc-ust数据集对网络进行训练和评估。性能结果表明我们提出的模型的有效性。

Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题