人工制品有多糟？：分析语音增强错误对ASR的影响

论文标题

人工制品有多糟？：分析语音增强错误对ASR的影响

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

论文作者

Iwamoto, Kazuma, Ochiai, Tsubasa, Delcroix, Marc, Ikeshita, Rintaro, Sato, Hiroshi, Araki, Shoko, Katagiri, Shigeru

论文摘要

通过单渠道语音增强（SE），在嘈杂条件下提高自动语音识别（ASR）表现是一项挑战。在本文中，我们通过使用基于正交投影的分解（OPD）分解SE误差来研究ASR性能降解的原因。 OPD将SE错误分解为噪声和人工制品组件。伪影组件定义为无法表示为语音和噪声源的线性组合的SE误差信号。我们建议手动缩放误差组件，以分析其对ASR的影响。我们通过实验性地将伪像的成分确定为造成性能降解的主要原因，我们发现减轻伪像可以大大改善ASR性能。此外，我们证明了简单的观察添加（OA）技术（即添加了观察到的信号的缩放版本到增强的语音中）可以单调地增加信噪比在轻度条件下。因此，我们在实验上确认OA可以改善模拟和真实记录的ASR性能。本文的发现提供了对SE错误对ASR的影响的更好理解，并为未来的研究打开了针对ASR设计有效的单渠道SE前端的新方法研究的大门。

It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the enhanced speech) can monotonically increase the signal-to-artifact ratio under a mild condition. Accordingly, we experimentally confirm that OA improves ASR performance for both simulated and real recordings. The findings of this paper provide a better understanding of the influence of SE errors on ASR and open the door to future research on novel approaches for designing effective single-channel SE front-ends for ASR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题