神经扬声器诊断与扬声器链规则

论文标题

神经扬声器诊断与扬声器链规则

Neural Speaker Diarization with Speaker-Wise Chain Rule

论文作者

Fujita, Yusuke, Watanabe, Shinji, Horiguchi, Shota, Xue, Yawen, Shi, Jing, Nagamatsu, Kenji

论文摘要

扬声器诊断是处理多演讲者音频的重要步骤。尽管端到端的神经诊断（EEND）方法达到了最先进的表现，但它仅限于固定的扬声器。在本文中，我们通过基于概率链规则的新颖的有条件推理方法解决了固定数量的说话者问题。在提出的方法中，每个说话者的语音活动被视为一个随机变量，并根据先前估计的其他说话者的语音活动依次估计。与其他序列到序列模型相似，该提出的方法会产生具有停止序列条件的扬声器数量。我们评估了可变数量的扬声器的多扬声器音频记录的建议方法。实验结果表明，所提出的方法可以用可变数量的扬声器来正确产生诊断结果，并且在诊断错误率方面胜过最先进的端到端说话者诊断方法。

Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each speaker's speech activity is regarded as a single random variable, and is estimated sequentially conditioned on previously estimated other speakers' speech activities. Similar to other sequence-to-sequence models, the proposed method produces a variable number of speakers with a stop sequence condition. We evaluated the proposed method on multi-speaker audio recordings of a variable number of speakers. Experimental results show that the proposed method can correctly produce diarization results with a variable number of speakers and outperforms the state-of-the-art end-to-end speaker diarization methods in terms of diarization error rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题