低延迟语音分离的电话交谈指导性诊断

论文标题

低延迟语音分离的电话交谈指导性诊断

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

论文作者

Morrone, Giovanni, Cornell, Samuele, Raj, Desh, Serafini, Luca, Zovato, Enrico, Brutti, Alessio, Squartini, Stefano

论文摘要

在本文中，我们对电话对话中语音分离引导诊断（SSGD）的使用进行了分析。 SSGD通过将扬声器信号分开，然后在每个估计的扬声器信号上应用语音活动检测来进行诊断。特别是，我们比较了两个低延迟语音分离模型。此外，我们展示了一种后处理算法，该算法大大减少了SSGD管道的错误警报错误。我们在两个数据集上执行实验：Fisher Copus第1部分和Callhome，评估分离和诊断指标。值得注意的是，我们的基于SSGD DPRNN的在线模型在Callhome上达到了11.1％的DER，尽管接受了较低的数据阶段培训，并且具有较低的延迟，即，即0.1 vs. 10秒。我们还表明，分离的信号可以很容易地将其馈送到语音识别后端，并且具有靠近Oracle源信号的性能。

In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be readily fed to a speech recognition back-end with performance close to the oracle source signals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题