婴儿：用于分析婴儿发声的深度神经网络

论文标题

婴儿：用于分析婴儿发声的深度神经网络

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations

论文作者

Ebrahimpour, Mohammad K., Schneider, Sara, Noelle, David C., Kello, Christopher T.

论文摘要

婴儿发声的声学分析对于语音发展以及声音分类中的应用非常有价值。先前的研究集中在基于语音处理理论，例如基于光谱和基于CEPSTRUM的分析的理论上。最近，已经开发了深度学习的端到端模型，以将原始的语音信号（声波形）作为输入和卷积神经网络层，以根据分类任务学习语音的表示。我们应用了最新的声音分类模型，分析了在实验室外面自然设置中记录的标记婴儿和成人发声的大规模数据库，但无法控制记录条件。该模型学习了基本分类，例如婴儿与成人发声，婴儿语音相关与非语音的发声，以及规范与非规范性的bab言。该模型经过了3到18个月大的婴儿的录音培训，随着言语变得更加独特，分类的准确性随着年龄的增长而变化，而Babbling变得更像言语。需要进一步的工作来验证和探索模型和数据集，但是我们的结果表明，如何使用深度学习来衡量和调查语音采集和发展，并在语音病理学和婴儿监测中使用潜在的应用。

Acoustic analyses of infant vocalizations are valuable for research on speech development as well as applications in sound classification. Previous studies have focused on measures of acoustic features based on theories of speech processing, such spectral and cepstrum-based analyses. More recently, end-to-end models of deep learning have been developed to take raw speech signals (acoustic waveforms) as inputs and convolutional neural network layers to learn representations of speech sounds based on classification tasks. We applied a recent end-to-end model of sound classification to analyze a large-scale database of labeled infant and adult vocalizations recorded in natural settings outside the lab with no control over recording conditions. The model learned basic classifications like infant versus adult vocalizations, infant speech-related versus non-speech vocalizations, and canonical versus non-canonical babbling. The model was trained on recordings of infants ranging from 3 to 18 months of age, and classification accuracy changed with age as speech became more distinct and babbling became more speech-like. Further work is needed to validate and explore the model and dataset, but our results show how deep learning can be used to measure and investigate speech acquisition and development, with potential applications in speech pathology and infant monitoring.

下载PDF全文

下载文献需遵守相关版权规定

论文标题