关于音频事件检测和本地化的多任务丢失函数

论文标题

关于音频事件检测和本地化的多任务丢失函数

On Multitask Loss Function for Audio Event Detection and Localization

论文作者

Phan, Huy, Pham, Lam, Koch, Philipp, Duong, Ngoc Q. K., McLoughlin, Ian, Mertins, Alfred

论文摘要

音频事件的本地化和检测（SELD）通常使用多任务模型来解决。这样的模型通常由一个多标签事件分类分支组成，该分类分支具有Sigmoid横向渗透损失，用于事件活动检测，而回归分支具有平均平方误差损失，以实现到达方向估计。在这项工作中，我们提出了一个多任务回归模型，其中（多标签）事件检测和本地化都是回归问题，并将平均平方误差丢失用于模型训练。我们表明，异质损失函数的常见组合导致网络不足以及格数据，而同质平方误差损失损失则可以更好地收敛和性能。关于DCASE 2020 SELD任务的开发和验证集的实验表明，所提出的系统在所有检测和本地化指标上还优于Dcase 2020 SELD基线，从而将整体筛选误差（合并度量）减少了约10％。

Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch with sigmoid cross-entropy loss for event activity detection and a regression branch with mean squared error loss for direction-of-arrival estimation. In this work, we propose a multitask regression model, in which both (multi-label) event detection and localization are formulated as regression problems and use the mean squared error loss homogeneously for model training. We show that the common combination of heterogeneous loss functions causes the network to underfit the data whereas the homogeneous mean squared error loss leads to better convergence and performance. Experiments on the development and validation sets of the DCASE 2020 SELD task demonstrate that the proposed system also outperforms the DCASE 2020 SELD baseline across all the detection and localization metrics, reducing the overall SELD error (the combined metric) by approximately 10% absolute.

下载PDF全文

下载文献需遵守相关版权规定

论文标题