论文标题
自动违反障碍语音检测利用基于成对距离的卷积神经网络
Automatic dysarthric speech detection exploiting pairwise distance-based convolutional neural networks
论文作者
论文摘要
自动违反障碍语音检测可以提供可靠且具有成本效益的计算机辅助工具,以协助构造障碍的临床诊断和管理。在本文中,我们提出了一种基于卷积神经网络(CNN)对成对距离矩阵的分析,提出了一种新型的自动质心语音检测方法。我们通过关节后代表示话语,并考虑了成对的语音平衡表示,其中一种来自健康的说话者(即参考表示)的表示,而测试发言人的其他表示(即测试表示)。给定这样的参考和测试表示,首先使用特征提取前端提取特征,计算框架级距离矩阵,并通过基于CNN的基于CNN的二进制分类器将获得的距离矩阵视为图像。特征提取,距离矩阵计算和基于CNN的分类器在端到端框架中共同优化。对不同语言和病理的两个健康和违反言语扬声器数据库的实验结果表明,所提出的方法产生了较高的质心语音检测性能,表现优于其他基于CNN的基线方法。
Automatic dysarthric speech detection can provide reliable and cost-effective computer-aided tools to assist the clinical diagnosis and management of dysarthria. In this paper we propose a novel automatic dysarthric speech detection approach based on analyses of pairwise distance matrices using convolutional neural networks (CNNs). We represent utterances through articulatory posteriors and consider pairs of phonetically-balanced representations, with one representation from a healthy speaker (i.e., the reference representation) and the other representation from the test speaker (i.e., test representation). Given such pairs of reference and test representations, features are first extracted using a feature extraction front-end, a frame-level distance matrix is computed, and the obtained distance matrix is considered as an image by a CNN-based binary classifier. The feature extraction, distance matrix computation, and CNN-based classifier are jointly optimized in an end-to-end framework. Experimental results on two databases of healthy and dysarthric speakers for different languages and pathologies show that the proposed approach yields a high dysarthric speech detection performance, outperforming other CNN-based baseline approaches.