基于声音的基于声音的性别分类

论文标题

基于声音的基于声音的性别分类

Vocal Breath Sound Based Gender Classification

论文作者

Solanki, Mohammad Shaique, Bharadwaj, Ashutosh M, K, Jeevan, Ghosh, Prasanta Kumar

论文摘要

众所周知，诸如连续语音之类的语音信号具有声音特征，例如音调（F0），以及可用于性别分类的共振体频率（F1，F2，F3）。但是，由于缺乏典型的性别特定的声学特征，因此尚未探索使用非语音信号（例如声带声音）的性别分类研究。在这项工作中，我们探讨了声音呼吸听起来是否编码性别信息，如果是，则可以在多大程度上用于自动性别分类。在这项研究中，我们探讨了从声音呼吸声音中使用数据驱动和基于知识的特征以及性别分类的分类器复杂性。我们还探讨了用于自动分类的位置和呼吸信号段的重要性。使用54.23分钟的男性和51.83分钟的女性呼吸的实验表明，基于知识的特征，即具有低复杂分类器的MFCC统计数据与具有更高复杂性的分类器的数据驱动特征相当。发现平均持续时间为3秒的呼吸段被认为是最佳选择，无论避免需要呼吸周期边界注释的位置。

Voiced speech signals such as continuous speech are known to have acoustic features such as pitch(F0), and formant frequencies(F1, F2, F3) which can be used for gender classification. However, gender classification studies using non-speech signals such as vocal breath sounds have not been explored as they lack typical gender-specific acoustic features. In this work, we explore whether vocal breath sounds encode gender information and if so, to what extent it can be used for automatic gender classification. In this study, we explore the use of data-driven and knowledge-based features from vocal breath sounds as well as the classifier complexity for gender classification. We also explore the importance of the location and duration of breath signal segments to be used for automatic classification. Experiments with 54.23 minutes of male and 51.83 minutes of female breath sounds reveal that knowledge-based features, namely MFCC statistics, with low-complexity classifier perform comparably to the data-driven features with classifiers of higher complexity. Breath segments with an average duration of 3 seconds are found to be the best choice irrespective of the location which avoids the need for breath cycle boundary annotation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题