一个改进的无关事件与复音声音事件的网络定位和检测

论文标题

一个改进的无关事件与复音声音事件的网络定位和检测

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

论文作者

Cao, Yin, Iqbal, Turab, Kong, Qiuqiang, An, Fengyan, Wang, Wenwu, Plumbley, Mark D.

论文摘要

共同执行声音事件检测（SED）和到达方向（DOA）估计的多形声音事件定位和检测（SELD）（SELD）估计了声音事件的类型和发生时间及其相应的DOA角度。我们从多任务学习的角度研究SELD任务。本文解决了两个开放问题。首先，为了检测相同类型的重叠声音事件，但使用不同的DOA，我们建议使用跟踪输出格式，并通过置换式训练解决随附的轨道置换问题。多头自我注意力被进一步用于分离轨道。其次，先前的发现是，通过使用硬参数共享，与分别学习子任务相比，SELD遭受了性能损失。这是通过软参数共享方案解决的。我们将提出的方法称为事件独立网络V2（EINV2），它是我们以前提供的方法的改进版本，也是SELD的端到端网络。我们表明，对于关节SED和DOA估计，我们提出的EINV2优于先前的方法，并且具有与最先进的集合模型相当的性能。

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlapping sound events of the same type but with different DoAs, we propose to use a trackwise output format and solve the accompanying track permutation problem with permutation-invariant training. Multi-head self-attention is further used to separate tracks. Secondly, a previous finding is that, by using hard parameter-sharing, SELD suffers from a performance loss compared with learning the subtasks separately. This is solved by a soft parameter-sharing scheme. We term the proposed method as Event Independent Network V2 (EINV2), which is an improved version of our previously-proposed method and an end-to-end network for SELD. We show that our proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题