论文标题

听力障碍的文字到语音

Text-to-speech for the hearing impaired

论文作者

Schlittenlacher, Josef, Baer, Thomas

论文摘要

文本到语音(TTS)系统提供了弥补来源的听力损失的机会,而不是在接收端进行纠正。这消除了限制,例如算法的时间限制,这些算法会扩大助听器中的声音并可能导致更高的语音质量。我们提出了一种算法,该算法在时间,频率和级别的高分辨率下恢复正常感知,并将其嵌入使用TACOTRON2和WAVEGLOW的TTS系统中,以产生单独放大的语音。对语音质量的主观评估表明,所提出的算法导致高质量的音频,声音质量类似于原始或线性放大的语音,但噪声中的语音清晰度较高。转移学习导致了从原始语音到单独放大语音的生产光谱的快速适应,导致了高度的语音质量和清晰度,因此使我们有一种有效地培训单个TTS系统的方法。

Text-to-speech (TTS) systems offer the opportunity to compensate for a hearing loss at the source rather than correcting for it at the receiving end. This removes limitations such as time constraints for algorithms that amplify a sound in a hearing aid and can lead to higher speech quality. We propose an algorithm that restores loudness to normal perception at a high resolution in time, frequency and level, and embed it in a TTS system that uses Tacotron2 and WaveGlow to produce individually amplified speech. Subjective evaluations of speech quality showed that the proposed algorithm led to high-quality audio with sound quality similar to original or linearly amplified speech but considerably higher speech intelligibility in noise. Transfer learning led to a quick adaptation of the produced spectra from original speech to individually amplified speech, resulted in high speech quality and intelligibility, and thus gives us a way to train an individual TTS system efficiently.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源