找到此问题，不是：班级条件的声音事件DOA估计

论文标题

找到此问题，不是：班级条件的声音事件DOA估计

Locate This, Not That: Class-Conditioned Sound Event DOA Estimation

论文作者

Slizovskaia, Olga, Wichern, Gordon, Wang, Zhong-Qiu, Roux, Jonathan Le

论文摘要

现有的声音事件定位和检测系统（SELD）通常通过每次瞬间估算所有类的源位置来运行。在本文中，我们为我们可能始终对所有课程都不感兴趣的情况提出了一个替代类别的SELD模型。该类调节的SELD模型将声音文件中的空间和光谱特征输入，也将单热量向量显示为我们当前对本地化感兴趣的类。我们使用特征线性调制（膜）层在模型中的几个点注入调节信息。通过DCASE 2020 Task 3数据集的实验，我们表明，所提出的类调节的SELD模型的性能比同时定位所有类别的基线模型更好，并且胜过仅培训的专家模型，这些模型只能定位单个类别的兴趣。我们还评估了DCASE 2021任务3数据集的性能，其中包括定向干扰（我们对本地化不感兴趣的课程的声音事件），并注意到类调节模型的强大改进。

Existing systems for sound event localization and detection (SELD) typically operate by estimating a source location for all classes at every time instant. In this paper, we propose an alternative class-conditioned SELD model for situations where we may not be interested in localizing all classes all of the time. This class-conditioned SELD model takes as input the spatial and spectral features from the sound file, and also a one-hot vector indicating the class we are currently interested in localizing. We inject the conditioning information at several points in our model using feature-wise linear modulation (FiLM) layers. Through experiments on the DCASE 2020 Task 3 dataset, we show that the proposed class-conditioned SELD model performs better in terms of common SELD metrics than the baseline model that locates all classes simultaneously, and also outperforms specialist models that are trained to locate only a single class of interest. We also evaluate performance on the DCASE 2021 Task 3 dataset, which includes directional interference (sound events from classes we are not interested in localizing) and notice especially strong improvement from the class-conditioned model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题