带有移动源的回响空间声音场景的数据集，以进行声音事件本地化和检测

论文标题

带有移动源的回响空间声音场景的数据集，以进行声音事件本地化和检测

A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

论文作者

Politis, Archontis, Adavanne, Sharath, Virtanen, Tuomas

论文摘要

本报告介绍了DCASE 2020挑战的声音事件本地化与检测任务（SELD）任务的数据集和评估设置。 SELD任务是指尝试同时对一组已知的声音事件类别进行分类，检测其时间激活并在活动活动时估算其空间方向或位置的问题。为了训练和测试SELD系统，需要在逼真的声学条件下发生的各种声音事件的数据集。与以前的挑战相比，为DCASE 2020创建了一个更加复杂的数据集。这两个关键差异是声学条件和动态条件（即移动源）更加多样化。空间声音场景是使用以连续方式捕获的真实房间脉冲响应，并缓慢移动的激发源创建。静态和移动的声音事件都是从中综合的。添加了在位置记录的环境噪声以完成场景录音的生成。基线方法基于卷积复发性神经网络伴随数据集，以为任务提供基准分数。基线是以前挑战中使用的版本的更新版本，其输入功能和培训修改以提高其性能。

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge. The SELD task refers to the problem of trying to simultaneously classify a known set of sound event classes, detect their temporal activations, and estimate their spatial directions or locations while they are active. To train and test SELD systems, datasets of diverse sound events occurring under realistic acoustic conditions are needed. Compared to the previous challenge, a significantly more complex dataset was created for DCASE 2020. The two key differences are a more diverse range of acoustical conditions, and dynamic conditions, i.e. moving sources. The spatial sound scenes are created using real room impulse responses captured in a continuous manner with a slowly moving excitation source. Both static and moving sound events are synthesized from them. Ambient noise recorded on location is added to complete the generation of scene recordings. A baseline SELD method accompanies the dataset, based on a convolutional recurrent neural network, to provide benchmark scores for the task. The baseline is an updated version of the one used in the previous challenge, with input features and training modifications to improve its performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题