论文标题
在口吃的语音中识别主要和附带轨道
Identification of primary and collateral tracks in stuttered speech
论文作者
论文摘要
以前从两个主要角度讲述了语音:临床视角侧重于诊断,而自然语言处理(NLP)的观点旨在建模这些事件并检测到下游任务。此外,以前的作品经常根据输入功能是文本还是语音使用不同的指标,这使得很难比较不同的贡献。在这里,我们介绍了一个新的评估框架,以通过临床和NLP的观点以及\ cite {clark1996use}的性能理论启发,从而区分主要和附带轨道。我们从半定向访谈的语料库中介绍了一个新颖的强制分配的分歧数据集,目前的基线结果直接比较了基于文本的特征(单词和跨度信息)和基于语音的(声学范围信息)的性能。最后,我们介绍了灵感来自基于单词的跨度功能的新音频功能。我们通过实验表明,使用这些功能在当前数据集上的基于语音的预测优于基准。
Disfluent speech has been previously addressed from two main perspectives: the clinical perspective focusing on diagnostic, and the Natural Language Processing (NLP) perspective aiming at modeling these events and detect them for downstream tasks. In addition, previous works often used different metrics depending on whether the input features are text or speech, making it difficult to compare the different contributions. Here, we introduce a new evaluation framework for disfluency detection inspired by the clinical and NLP perspective together with the theory of performance from \cite{clark1996using} which distinguishes between primary and collateral tracks. We introduce a novel forced-aligned disfluency dataset from a corpus of semi-directed interviews, and present baseline results directly comparing the performance of text-based features (word and span information) and speech-based (acoustic-prosodic information). Finally, we introduce new audio features inspired by the word-based span features. We show experimentally that using these features outperformed the baselines for speech-based predictions on the present dataset.