使用暹罗经自动编码器自动检测儿童语音中语音错误

论文标题

使用暹罗经自动编码器自动检测儿童语音中语音错误

Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder

论文作者

Ng, Si-Ioi, Lee, Tan

论文摘要

语音障碍（SSD）是指儿童在正确发音单词中遇到持续困难的发育障碍。 SSD的评估主要依赖于训练有素的言语和语言病理学家（SLP）。随着对SLP的需求不断增长和持久的短缺，言语障碍的自动评估成为有助于临床工作的一种非常理想的方法。本文介绍了一项研究，该研究是基于新收集的大型语音语料库的幼儿园儿童粤语语音中语音错误的研究。提出的语音错误检测方法涉及使用暹罗复发器自动编码器，该自动编码器经过培训，以了解嵌入式空间中电话段之间的相似性和差异。对模型的培训只需要通常发育中的儿童的语音数据。为了区分无序的语音和典型的语音，计算了测试段和参考段之间的余弦距离。试验了不同的模型架构和培训策略。检测6个最常见的辅音误差的结果表明该模型的表现令人满意，平均精度值在0.82到0.93。

Speech sound disorder (SSD) refers to the developmental disorder in which children encounter persistent difficulties in correctly pronouncing words. Assessment of SSD has been relying largely on trained speech and language pathologists (SLPs). With the increasing demand for and long-lasting shortage of SLPs, automated assessment of speech disorder becomes a highly desirable approach to assisting clinical work. This paper describes a study on automatic detection of phonological errors in Cantonese speech of kindergarten children, based on a newly collected large speech corpus. The proposed approach to speech error detection involves the use of a Siamese recurrent autoencoder, which is trained to learn the similarity and discrepancy between phone segments in the embedding space. Training of the model requires only speech data from typically developing (TD) children. To distinguish disordered speech from typical one, cosine distance between the embeddings of the test segment and the reference segment is computed. Different model architectures and training strategies are experimented. Results on detecting the 6 most common consonant errors demonstrate satisfactory performance of the proposed model, with the average precision value from 0.82 to 0.93.

下载PDF全文

下载文献需遵守相关版权规定

论文标题