关于使用UA语音和Torgo数据库来验证自动违规语音分类方法

论文标题

关于使用UA语音和Torgo数据库来验证自动违规语音分类方法

On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches

论文作者

Schu, Guilherme, Janbakhshi, Parvaneh, Kodrasi, Ina

论文摘要

尽管控制和违反违规语音的UA语音和Torgo数据库是为了开发强大的自动语音识别系统提供的宝贵资源，但它们也已被用来验证相当多的自动质心语音分类方法。这种方法通常依赖于基本假设，即使用相同的记录设置在相同的无噪声环境中收集了来自控制和违反障碍扬声器的记录。在本文中，我们证明了UA语音和Torgo数据库违反了此假设。使用语音活动检测来提取语音和非语音段，我们表明，当使用这些数据库的非语音段时，大多数最新的构音障碍分类方法与使用语音段相比，实现了相同或更好的性能。这些结果表明，在UA语音和TORGO数据库中训练和验证的这种方法是记录环境或设置的潜在学习特征，而不是质心语音特征。我们希望这些结果提高研究界对开发和评估自动构音障碍分类方法的录音质量重要性的认识。

Although the UA-Speech and TORGO databases of control and dysarthric speech are invaluable resources made available to the research community with the objective of developing robust automatic speech recognition systems, they have also been used to validate a considerable number of automatic dysarthric speech classification approaches. Such approaches typically rely on the underlying assumption that recordings from control and dysarthric speakers are collected in the same noiseless environment using the same recording setup. In this paper, we show that this assumption is violated for the UA-Speech and TORGO databases. Using voice activity detection to extract speech and non-speech segments, we show that the majority of state-of-the-art dysarthria classification approaches achieve the same or a considerably better performance when using the non-speech segments of these databases than when using the speech segments. These results demonstrate that such approaches trained and validated on the UA-Speech and TORGO databases are potentially learning characteristics of the recording environment or setup rather than dysarthric speech characteristics. We hope that these results raise awareness in the research community about the importance of the quality of recordings when developing and evaluating automatic dysarthria classification approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题