论文标题

与区域建议网络的扬声器诊断

Speaker Diarization with Region Proposal Network

论文作者

Huang, Zili, Watanabe, Shinji, Fujita, Yusuke, Garcia, Paola, Shao, Yiwen, Povey, Daniel, Khudanpur, Sanjeev

论文摘要

演讲者诊断是许多语音应用的重要预处理步骤,它旨在解决问题的“谁在说话”。尽管标准诊断系统可以在各种情况下获得令人满意的结果,但它们由几个独立优化的模块组成,无法处理重叠的语音。在本文中,我们提出了一种新颖的扬声器诊断方法:基于区域建议网络的说话者诊断(RPNSD)。在这种方法中,神经网络会产生重叠的语音段建议,并同时计算其说话者的嵌入。与标准诊断系统相比,RPNSD的管道较短,可以处理重叠的语音。三个诊断数据集的实验结果表明,RPNSD比最先进的X矢量基线取得了显着改善。

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they are composed of several independently-optimized modules and cannot deal with the overlapped speech. In this paper, we propose a novel speaker diarization method: Region Proposal Network based Speaker Diarization (RPNSD). In this method, a neural network generates overlapped speech segment proposals, and compute their speaker embeddings at the same time. Compared with standard diarization systems, RPNSD has a shorter pipeline and can handle the overlapped speech. Experimental results on three diarization datasets reveal that RPNSD achieves remarkable improvements over the state-of-the-art x-vector baseline.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源