基于典型相关性和听力辅助技术的深度学习的语音清晰度增强模型

论文标题

基于典型相关性和听力辅助技术的深度学习的语音清晰度增强模型

A Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning for Hearing-Assistive Technologies

论文作者

Hussain, Tassadaq, Diyan, Muhammad, Gogate, Mandar, Dashtipour, Kia, Adeel, Ahsan, Tsao, Yu, Hussain, Amir

论文摘要

通常，基于深度学习（DL）在嘈杂环境中提高语音清晰度的方法通常受到训练，以最大程度地减少清洁和增强的语音特征之间的距离。这些通常会提高语音质量，但是它们缺乏概括，并且可能不会在日常嘈杂的情况下提供所需的语音清晰度。为了应对这些挑战，研究人员探索了面向可理解性的（I-O）损失功能，以训练DL方法以增强稳健的语音（SE）。在本文中，我们制定了一种新型的基于规范相关的I-O损耗函数，以更有效地训练DL算法。具体而言，我们提出了一个完全卷积的SE模型，该模型使用基于经典的规范相关的短期客观可理解（CC-Stoi）度量作为培训成本函数。据我们所知，这是利用SE基于I-O的损耗函数中规范相关性集成的第一项工作。比较实验结果表明，我们提出的基于CC-Stoi的SE框架优于传统的Stoi和基于距离的损失函数训练的DL模型，而在与未见的说话者和噪音交易时，就标准客观和主观评估措施而言。

Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are generally trained to minimise the distance between clean and enhanced speech features. These often result in improved speech quality however they suffer from a lack of generalisation and may not deliver the required speech intelligibility in everyday noisy situations. In an attempt to address these challenges, researchers have explored intelligibility-oriented (I-O) loss functions to train DL approaches for robust speech enhancement (SE). In this paper, we formulate a novel canonical correlation-based I-O loss function to more effectively train DL algorithms. Specifically, we present a fully convolutional SE model that uses a modified canonical-correlation based short-time objective intelligibility (CC-STOI) metric as a training cost function. To the best of our knowledge, this is the first work that exploits the integration of canonical correlation in an I-O based loss function for SE. Comparative experimental results demonstrate that our proposed CC-STOI based SE framework outperforms DL models trained with conventional STOI and distance-based loss functions, in terms of both standard objective and subjective evaluation measures when dealing with unseen speakers and noises.

下载PDF全文

下载文献需遵守相关版权规定

论文标题