听力障碍的文字到语音

论文标题

听力障碍的文字到语音

Text-to-speech for the hearing impaired

论文作者

Schlittenlacher, Josef, Baer, Thomas

论文摘要

文本到语音（TTS）系统提供了弥补来源的听力损失的机会，而不是在接收端进行纠正。这消除了限制，例如算法的时间限制，这些算法会扩大助听器中的声音并可能导致更高的语音质量。我们提出了一种算法，该算法在时间，频率和级别的高分辨率下恢复正常感知，并将其嵌入使用TACOTRON2和WAVEGLOW的TTS系统中，以产生单独放大的语音。对语音质量的主观评估表明，所提出的算法导致高质量的音频，声音质量类似于原始或线性放大的语音，但噪声中的语音清晰度较高。转移学习导致了从原始语音到单独放大语音的生产光谱的快速适应，导致了高度的语音质量和清晰度，因此使我们有一种有效地培训单个TTS系统的方法。

Text-to-speech (TTS) systems offer the opportunity to compensate for a hearing loss at the source rather than correcting for it at the receiving end. This removes limitations such as time constraints for algorithms that amplify a sound in a hearing aid and can lead to higher speech quality. We propose an algorithm that restores loudness to normal perception at a high resolution in time, frequency and level, and embed it in a TTS system that uses Tacotron2 and WaveGlow to produce individually amplified speech. Subjective evaluations of speech quality showed that the proposed algorithm led to high-quality audio with sound quality similar to original or linearly amplified speech but considerably higher speech intelligibility in noise. Transfer learning led to a quick adaptation of the produced spectra from original speech to individually amplified speech, resulted in high speech quality and intelligibility, and thus gives us a way to train an individual TTS system efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题