选择性伪标记和范围的歧视性融合，以进行声音事件检测

论文标题

选择性伪标记和范围的歧视性融合，以进行声音事件检测

Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection

论文作者

Liang, Yunhao, Long, Yanhua, Li, Yijie, Liang, Jiaen

论文摘要

近年来，探索有效的声音分离（SSEP）技术以改善重叠的声音事件检测（SED）会引起越来越多的关注。创建准确的分离信号以避免在SED模型训练期间积累灾难性的错误积累非常重要且具有挑战性。在这项研究中，我们首先提出了一种新型的选择性伪标记方法，称为SPL，以从盲目的声音分离输出中产生高置信度分离的目标事件。然后，这些目标事件用于微调以多目标学习方式在声音混合物上预先训练的原始SED模型。然后，为了进一步利用SSEP输出，提出了通过组合声音混合物的多个帧级事件预测及其分离的信号，提出了一个通过类别的判别融合来改善最终的SED性能。所有实验均在公共DCASE 2021任务4数据集上进行，结果表明，我们的方法明显优于官方基线，基于锁骨的F 1，PSD1和PSDS2性能从44.3％，37.3％和54.9％和54.9％到46.5％，44.5％，44.5％和75.4％和75.4％的表现提高了。

In recent years, exploring effective sound separation (SSep) techniques to improve overlapping sound event detection (SED) attracts more and more attention. Creating accurate separation signals to avoid the catastrophic error accumulation during SED model training is very important and challenging. In this study, we first propose a novel selective pseudo-labeling approach, termed SPL, to produce high confidence separated target events from blind sound separation outputs. These target events are then used to fine-tune the original SED model that pre-trained on the sound mixtures in a multi-objective learning style. Then, to further leverage the SSep outputs, a class-wise discriminative fusion is proposed to improve the final SED performances, by combining multiple frame-level event predictions of both sound mixtures and their separated signals. All experiments are performed on the public DCASE 2021 Task 4 dataset, and results show that our approaches significantly outperforms the official baseline, the collar-based F 1, PSDS1 and PSDS2 performances are improved from 44.3%, 37.3% and 54.9% to 46.5%, 44.5% and 75.4%, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题