声音上下文分类基于加入学习模型和多光谱图功能

论文标题

声音上下文分类基于加入学习模型和多光谱图功能

Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features

论文作者

Ngo, Dat, Hoang, Hao, Nguyen, Anh, Ly, Tien, Pham, Lam

论文摘要

在本文中，我们提出了一个用于声学场景分类（ASC）的深度学习框架，即通过环境输入声音对场景上下文进行分类的任务。 ASC系统通常由两个主要步骤组成，称为前端特征提取和后端分类。在第一步中，提取器用于从原始音频信号中提取低级功能。接下来，提取的判别特征被分类器馈送并分类，以报告准确性结果。旨在开发适用于ASC的强大框架，我们解决了ASC系统中前端和后端组件的退出问题，因此提出了三个主要贡献：首先，我们对从声音场景输入提取的频谱表示的全面分析，因此提出了最佳的多光谱图组合。在后端分类方面，我们建议使用平行卷积复发网络进行新颖的结合学习结构，该网络可有效学习空间特征和频谱输入的时间序列。最后，通过IEEE AASP挑战的基准数据集检测和分类声学场景和事件（DCASE）2016 Task 1，2017 Task 1，2017 Task 1a＆1b，Litis Rouen证明我们提出的框架通用框架和ASC任务可靠的框架。

In this paper, we present a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds. An ASC system generally comprises of two main steps, referred to as front-end feature extraction and back-end classification. In the first step, an extractor is used to extract low-level features from raw audio signals. Next, the discriminative features extracted are fed into and classified by a classifier, reporting accuracy results. Aim to develop a robust framework applied for ASC, we address exited issues of both the front-end and back-end components in an ASC system, thus present three main contributions: Firstly, we carry out a comprehensive analysis of spectrogram representation extracted from sound scene input, thus propose the best multi-spectrogram combinations. In terms of back-end classification, we propose a novel join learning architecture using parallel convolutional recurrent networks, which is effective to learn spatial features and temporal sequences of spectrogram input. Finally, good experimental results obtained over benchmark datasets of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 1, 2017 Task 1, 2018 Task 1A & 1B, LITIS Rouen prove our proposed framework general and robust for ASC task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题