论文标题
优化基于小波的实时算法,以提高语音清晰度
Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility
论文作者
论文摘要
报告了基于小波的算法,以提高语音清晰度以及完整的数据集和结果。通过多级离散小波变换将离散的语音信号分为频率子频段。在重组以形成演讲的修改版本之前,将各种收益应用于子频段信号。在保持整体信号能量不变的同时,调整了子带的收益,并使用Google语音到文本转录在各种背景干扰和模拟的听力损失条件下进行语音清晰度进行了客观和定量的评估。一组通用的子带收益可以在高达4.8 dB的一系列噪声与信号比率上起作用。对于无噪声的语音,通过将光谱能量重新定位到中频频段,总体清晰度得到提高,并且Google的转录精度平均提高了16.9个百分点,最大值提高了86.7个百分点。对于已经被噪声损坏的语音,提高清晰度是具有挑战性的,但仍然可以实现,而转录精度的平均为9.5个百分点,最高为71.4。所提出的算法可用于实时语音处理,并且比以前的算法相对简单。潜在的应用包括语音增强,助听器,机器聆听以及对语音清晰度的更好理解。
The optimization of a wavelet-based algorithm to improve speech intelligibility along with the full data set and results are reported. The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform. Various gains are applied to the sub-band signals before they are recombined to form a modified version of the speech. The sub-band gains are adjusted while keeping the overall signal energy unchanged, and the speech intelligibility under various background interference and simulated hearing loss conditions is enhanced and evaluated objectively and quantitatively using Google Speech-to-Text transcription. A universal set of sub-band gains can work over a range of noise-to-signal ratios up to 4.8 dB. For noise-free speech, overall intelligibility is improved, and the Google transcription accuracy is increased by 16.9 percentage points on average and 86.7 maximum by reallocating the spectral energy toward the mid-frequency sub-bands. For speech already corrupted by noise, improving intelligibility is challenging but still realizable with an increased transcription accuracy of 9.5 percentage points on average and 71.4 maximum. The proposed algorithm is implementable for real-time speech processing and comparatively simpler than previous algorithms. Potential applications include speech enhancement, hearing aids, machine listening, and a better understanding of speech intelligibility.