论文标题

一个改进的无关事件与复音声音事件的网络定位和检测

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

论文作者

Cao, Yin, Iqbal, Turab, Kong, Qiuqiang, An, Fengyan, Wang, Wenwu, Plumbley, Mark D.

论文摘要

共同执行声音事件检测(SED)和到达方向(DOA)估计的多形声音事件定位和检测(SELD)(SELD)估计了声音事件的类型和发生时间及其相应的DOA角度。我们从多任务学习的角度研究SELD任务。本文解决了两个开放问题。首先,为了检测相同类型的重叠声音事件,但使用不同的DOA,我们建议使用跟踪输出格式,并通过置换式训练解决随附的轨道置换问题。多头自我注意力被进一步用于分离轨道。其次,先前的发现是,通过使用硬参数共享,与分别学习子任务相比,SELD遭受了性能损失。这是通过软参数共享方案解决的。我们将提出的方法称为事件独立网络V2(EINV2),它是我们以前提供的方法的改进版本,也是SELD的端到端网络。我们表明,对于关节SED和DOA估计,我们提出的EINV2优于先前的方法,并且具有与最先进的集合模型相当的性能。

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously. We study the SELD task from a multi-task learning perspective. Two open problems are addressed in this paper. Firstly, to detect overlapping sound events of the same type but with different DoAs, we propose to use a trackwise output format and solve the accompanying track permutation problem with permutation-invariant training. Multi-head self-attention is further used to separate tracks. Secondly, a previous finding is that, by using hard parameter-sharing, SELD suffers from a performance loss compared with learning the subtasks separately. This is solved by a soft parameter-sharing scheme. We term the proposed method as Event Independent Network V2 (EINV2), which is an improved version of our previously-proposed method and an end-to-end network for SELD. We show that our proposed EINV2 for joint SED and DoA estimation outperforms previous methods by a large margin, and has comparable performance to state-of-the-art ensemble models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源