根据合成音景评估的最先进的声音事件检测系统的基准

论文标题

根据合成音景评估的最先进的声音事件检测系统的基准

A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes

论文作者

Ronchini, Francesca, Serizel, Romain

论文摘要

本文提出了提交的基准，以进行检测和分类的声学场景和事件2021挑战（DCASE）任务4代表声音事件检测任务中最先进的艺术品的采样。根据针对DCASE 2021挑战任务4提出的两个多形声音检测得分方案评估提交，这允许对提交是否旨在执行精细粒度的时间分割，粗粒度的时间分割或已设计为在现场提出的一定程度上进行分析。我们研究了参与者提出的解决方案，以分析其稳健性，以不同级别的目标对非目标信噪比和目标声音事件的时间定位。提出了最后一个实验，以研究非目标事件对系统输出的影响。结果表明，适合提供粗分割输出的系统对非目标信噪比的不同目标更为强大，并且借助特定的数据增强方法，它们对于原始事件的时间定位更为强大。最后一个实验的结果表明，当存在非目标事件时，系统倾向于微不足道地预测短期事件。对于量身定制的，尤其如此。

This paper proposes a benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task. The submissions are evaluated according to the two polyphonic sound detection score scenarios proposed for the DCASE 2021 Challenge Task 4, which allow to make an analysis on whether submissions are designed to perform fine-grained temporal segmentation, coarse-grained temporal segmentation, or have been designed to be polyvalent on the scenarios proposed. We study the solutions proposed by participants to analyze their robustness to varying level target to non-target signal-to-noise ratio and to temporal localization of target sound events. A last experiment is proposed in order to study the impact of non-target events on systems outputs. Results show that systems adapted to provide coarse segmentation outputs are more robust to different target to non-target signal-to-noise ratio and, with the help of specific data augmentation methods, they are more robust to time localization of the original event. Results of the last experiment display that systems tend to spuriously predict short events when non-target events are present. This is particularly true for systems that are tailored to have a fine segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题