关于使用CNN的声学场景分类方法的比较研究

论文标题

关于使用CNN的声学场景分类方法的比较研究

A Comparative Study on Approaches to Acoustic Scene Classification using CNNs

论文作者

Ananya, Ishrat Jahan, Suad, Sarah, Choudhury, Shadab Hafiz, Khan, Mohammad Ashrafuzzaman

论文摘要

声学场景分类是表征和分类声音录音中环境的过程。第一步是从录制的声音中生成功能（表示），然后对背景环境进行分类。但是，各种表示对分类的准确性具有巨大影响。在本文中，我们使用神经网络探讨了有关分类精度的三个此类表示。我们使用不同的CNN网络和自动编码器研究了频谱图，MFCC和嵌入式表示。我们的数据集由室内和室外环境的三个设置中的声音组成 - 因此，数据集包含来自六种不同环境的声音。我们发现，频谱图表示的分类精度最高，而MFCC的分类精度最低。我们报告了我们的发现，见解以及一些准则，以实现使用声音的更好准确性。

Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such representations on classification accuracy using neural networks. We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders. Our dataset consists of sounds from three settings of indoors and outdoors environments - thus the dataset contains sound from six different kinds of environments. We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy. We reported our findings, insights as well as some guidelines to achieve better accuracy for environment classification using sounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题