论文标题
不断改善学习病理语音障碍的语音
Continuous Speech for Improved Learning Pathological Voice Disorders
论文作者
论文摘要
目标:许多研究成功区分了正常和异常的语音样本。然而,很少尝试进一步的分类。这项研究提出了一种新的方法,即使用连续的普通话语音而不是单个元音来对四种常见的语音疾病进行分类(即功能性吞咽困难,肿瘤,Phonotrauma和Socal Palsy)。方法:在提出的框架中,声信号转换为摩尔频率cepstral系数,并采用双向长期记忆网络(BILSTM)来建模顺序特征。实验是在大规模数据库上进行的,其中1,045个连续的语音是由2012年至2019年医院的语音诊所收集的。结果:实验结果表明,与单身vow相比,相比,相比之下,实验框架可产生显着准确的准确性和未加权的平均召回率提高78.12-89.27%和50.92-80.68%。结论:结果与其他机器学习算法一致,包括封闭式复发单元,随机森林,深神经网络和LSTM。还分析了对每种疾病的敏感性,并通过主成分分析可视化模型功能。基于平衡数据集的替代实验再次证实了使用连续语音进行语音障碍的优势。
Goal: Numerous studies had successfully differentiated normal and abnormal voice samples. Nevertheless, further classification had rarely been attempted. This study proposes a novel approach, using continuous Mandarin speech instead of a single vowel, to classify four common voice disorders (i.e. functional dysphonia, neoplasm, phonotrauma, and vocal palsy). Methods: In the proposed framework, acoustic signals are transformed into mel-frequency cepstral coefficients, and a bi-directional long-short term memory network (BiLSTM) is adopted to model the sequential features. The experiments were conducted on a large-scale database, wherein 1,045 continuous speech were collected by the speech clinic of a hospital from 2012 to 2019. Results: Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall improvements of 78.12-89.27% and 50.92-80.68%, respectively, compared with systems that use a single vowel. Conclusions: The results are consistent with other machine learning algorithms, including gated recurrent units, random forest, deep neural networks, and LSTM. The sensitivities for each disorder were also analyzed, and the model capabilities were visualized via principal component analysis. An alternative experiment based on a balanced dataset again confirms the advantages of using continuous speech for learning voice disorders.