用自动语音识别在left裂语音中提高过度纳质估计

论文标题

用自动语音识别在left裂语音中提高过度纳质估计

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

论文作者

Song, Kaitao, Wan, Teng, Wang, Bixia, Jiang, Huiqiang, Qiu, Luna, Xu, Jiahang, Jiang, Liping, Lou, Qun, Yang, Yuqing, Li, Dongsheng, Wang, Xudong, Qiu, Lili

论文摘要

高纳米性是人类言语产生的异常共鸣，尤其是在颅面异常（如裂口）的患者中。在临床应用中，高纳米性估计对于口感诊断至关重要，因为其结果决定了随后的手术和其他语音疗法。因此，设计自动超鼻涕评估方法将促进言语病理学家进行精确诊断。现有的高潮估计方法仅通过使用基于统计或基于神经网络的特征，基于低资源裂口papate数据集进行声学分析。在本文中，我们提出了一种新颖的方法，该方法使用自动语音识别模型来改善过度鼻腔估计。具体而言，我们首先通过使用语音到文本数据集在自动语音识别（ASR）目标中预先训练编码器框架，然后在Cleft Palete DataSet上微调ASR编码器进行超麻液估算。从这种设计中受益，我们的超肿瘤估算模型可以享受ASR模型的优势：1）与低资源裂口papate数据集相比，ASR任务通常包括一般域中的大规模语音数据，这可以使更好的模型概括； 2）ASR数据集指南模型中的文本注释，以提取更好的声学特征。两个裂口数据集的实验结果表明，与以前的方法相比，我们的方法的性能卓越。

Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-language pathologists to make precise diagnoses. Existing methods for hypernasality estimation only conduct acoustic analysis based on low-resource cleft palate dataset, by using statistical or neural network-based features. In this paper, we propose a novel approach that uses automatic speech recognition model to improve hypernasality estimation. Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation. Benefiting from such design, our model for hypernasality estimation can enjoy the advantages of ASR model: 1) compared with low-resource cleft palate dataset, the ASR task usually includes large-scale speech data in the general domain, which enables better model generalization; 2) the text annotations in ASR dataset guide model to extract better acoustic features. Experimental results on two cleft palate datasets demonstrate that our method achieves superior performance compared with previous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题