论文标题
通过语言修改的噪声自适应话语产生的数据驱动调查
A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification
论文作者
论文摘要
在嘈杂的环境中,对于人类来说,语音很难理解。口语对话框系统可以通过修改语音综合(例如模仿Lombard语音)或通过优化语言生成来帮助增强其输出的清晰度。我们在这里关注第二种方法,通过该方法,预期的消息是用在特定嘈杂环境中更可理解的单词来实现的。通过进行语音感知实验,我们创建了一个在Babble Noise中的900个释义的数据集,这是由以正常听力的英语说的人所感知的。我们发现,在SNR -5 dB时,仔细选择释义可以提高33%。我们对数据的分析表明,释义之间的可理解性差异主要由噪声声音提示驱动。此外,我们提出了一个可理解性的释义排名模型,该模型的表现优于基线模型,相对改善在SNR -5 dB时相对改善为31.37%。
In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which outperforms baseline models with a relative improvement of 31.37% at SNR -5 dB.