音频场景分类的深度功能嵌入和分层分类

论文标题

音频场景分类的深度功能嵌入和分层分类

Deep Feature Embedding and Hierarchical Classification for Audio Scene Classification

论文作者

Pham, Lam, McLoughlin, Ian, Phan, Huy, Palaniappan, Ramaswamy, Mertins, Alfred

论文摘要

在这项工作中，我们提出了一种方法，该方法具有深层特征，以嵌入学习和分层分类以及三胞胎损失功能进行声学场景分类（ASC）。一方面，首先对深度卷积神经网络进行了训练，以从场景音频信号中学习嵌入的功能。通过训练有素的卷积神经网络，学习的嵌入嵌入将输入嵌入到嵌入特征空间中，并将其转换为高级特征向量以进行表示。另一方面，为了利用场景类别的结构，原始场景分类问题结构为层次结构，其中类似类别被分组为元类别。然后，使用与三胞胎损耗函数相关的深神经网络分类器来完成层次分类。我们的实验表明，所提出的系统在DCASE 2018 Task 1A和1B数据集上均达到良好的性能，从而在Dase 1A和1B的DCASE 2018基准中获得了15.6％和16.6％的准确性提高。

In this work, we propose an approach that features deep feature embedding learning and hierarchical classification with triplet loss function for Acoustic Scene Classification (ASC). In the one hand, a deep convolutional neural network is firstly trained to learn a feature embedding from scene audio signals. Via the trained convolutional neural network, the learned embedding embeds an input into the embedding feature space and transforms it into a high-level feature vector for representation. In the other hand, in order to exploit the structure of the scene categories, the original scene classification problem is structured into a hierarchy where similar categories are grouped into meta-categories. Then, hierarchical classification is accomplished using deep neural network classifiers associated with triplet loss function. Our experiments show that the proposed system achieves good performance on both the DCASE 2018 Task 1A and 1B datasets, resulting in accuracy gains of 15.6% and 16.6% absolute over the DCASE 2018 baseline on Task 1A and 1B, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题