论文标题
解除:扬声器诊断端到端误差校正
DiaCorrect: End-to-end error correction for speaker diarization
论文作者
论文摘要
近年来,演讲者诊断引起了广泛的关注。为了取得更好的表现,一些研究建议在多个阶段诊断语音。尽管这些方法可能带来额外的好处,但其中大多数都很复杂。通过自动语音识别(ASR)中的拼写校正的激励,在本文中,我们建议以简单但有效的方式来完善初始诊断的端到端误差校正框架,以完善初始诊断。通过利用输入混合物及其相应的扬声器活性之间的声学相互作用,可以自动调整初始扬声器活性以最大程度地减少诊断误差。没有铃铛和哨子,基于LibrisPeech的2扬声器会议的实验表明,自动降低的端到端神经腹泻(SA-EEND)基线的基线可以将其诊断误差率(DER)从12.31%降低到4.63%。我们的源代码可在线网上在https://github.com/jyhan03/diacorrect上获得。
In recent years, speaker diarization has attracted widespread attention. To achieve better performance, some studies propose to diarize speech in multiple stages. Although these methods might bring additional benefits, most of them are quite complex. Motivated by spelling correction in automatic speech recognition (ASR), in this paper, we propose an end-to-end error correction framework, termed DiaCorrect, to refine the initial diarization results in a simple but efficient way. By exploiting the acoustic interactions between input mixture and its corresponding speaker activity, DiaCorrect could automatically adapt the initial speaker activity to minimize the diarization errors. Without bells and whistles, experiments on LibriSpeech based 2-speaker meeting-like data show that, the self-attentitive end-to-end neural diarization (SA-EEND) baseline with DiaCorrect could reduce its diarization error rate (DER) by over 62.4% from 12.31% to 4.63%. Our source code is available online at https://github.com/jyhan03/diacorrect.