论文标题
一个序列匹配的序列网络,用于多形声音事件的定位和检测
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
论文作者
论文摘要
复音声音事件检测和到达方向估计需要与音频信号不同的输入特征。尽管声音事件检测主要依赖于时频模式,但到达方向估计取决于麦克风之间的幅度或相位差异。先前的方法使用相同的输入功能来进行声音事件检测和到达方向估计,并以两阶段的转移方式共同训练这两个任务。我们提出了一种两步的方法,可以解除学习声音事件检测和到达方向估计系统的学习。在第一步中,我们分别检测到声音事件并估算出到达的方向,以优化每个系统的性能。在第二步中,我们训练一个深神网络,以匹配事件检测器的两个输出序列和排序估计器。这种模块化和分层方法允许系统设计的灵活性,并提高整个声音事件的定位和检测系统的性能。使用DCASE 2019声音事件定位和检测数据集的实验结果与以前的最新解决方案相比,性能提高了。
Polyphonic sound event detection and direction-of-arrival estimation require different input features from audio signals. While sound event detection mainly relies on time-frequency patterns, direction-of-arrival estimation relies on magnitude or phase differences between microphones. Previous approaches use the same input features for sound event detection and direction-of-arrival estimation, and train the two tasks jointly or in a two-stage transfer-learning manner. We propose a two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems. In the first step, we detect the sound events and estimate the directions-of-arrival separately to optimize the performance of each system. In the second step, we train a deep neural network to match the two output sequences of the event detector and the direction-of-arrival estimator. This modular and hierarchical approach allows the flexibility in the system design, and increase the performance of the whole sound event localization and detection system. The experimental results using the DCASE 2019 sound event localization and detection dataset show an improved performance compared to the previous state-of-the-art solutions.