使用转移学习和复发性神经网络进行声音事件本地化和检测的一般网络体系结构

论文标题

使用转移学习和复发性神经网络进行声音事件本地化和检测的一般网络体系结构

A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network

论文作者

Nguyen, Thi Ngoc Tho, Nguyen, Ngoc Khanh, Phan, Huy, Pham, Lam, Ooi, Kenneth, Jones, Douglas L., Gan, Woon-Seng

论文摘要

复音声音事件检测和定位（SELD）任务是具有挑战性的，因为很难在同一网络中共同优化声音事件检测（SED）和到达方向（DOA）估计。我们为SELD提出了一个通用的网络体系结构，其中SELD网络包含用于独立解决SED和DOA估计的子网络，以及将SED和DOA估计输出组合到SELD输出中的经常性层。复发层进行声音类别和声音事件的DOA之间的对齐，同时不知道这些输出是如何由上游SED和DOA估计算法产生的。这种简单的网络体系结构与不同的现有SED和DOA估计算法兼容。这是非常实用的，因为可以独立改进子网。使用DCASE 2020 SELD数据集的实验结果表明，我们提出的网络体系结构的性能使用不同的SED和DOA估计算法和不同的音频格式与其他最先进的SELD SELD算法具有竞争力。拟议的SELD网络体系结构的源代码可在GitHub上获得。

Polyphonic sound event detection and localization (SELD) task is challenging because it is difficult to jointly optimize sound event detection (SED) and direction-of-arrival (DOA) estimation in the same network. We propose a general network architecture for SELD in which the SELD network comprises sub-networks that are pretrained to solve SED and DOA estimation independently, and a recurrent layer that combines the SED and DOA estimation outputs into SELD outputs. The recurrent layer does the alignment between the sound classes and DOAs of sound events while being unaware of how these outputs are produced by the upstream SED and DOA estimation algorithms. This simple network architecture is compatible with different existing SED and DOA estimation algorithms. It is highly practical since the sub-networks can be improved independently. The experimental results using the DCASE 2020 SELD dataset show that the performances of our proposed network architecture using different SED and DOA estimation algorithms and different audio formats are competitive with other state-of-the-art SELD algorithms. The source code for the proposed SELD network architecture is available at Github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题