论文标题
mirnet:在重叠的语音中学习多个身份表示
MIRNet: Learning multiple identities representations in overlapped speech
论文作者
论文摘要
许多方法可以通过学习识别声学参数的一致特征来从语音中获取有关单个说话者身份的信息。但是,当给定信号中有多个并发扬声器时,确定身份信息是一项挑战。在本文中,我们提出了一种新颖的Deep Speaker代表策略,该策略可以可靠地从重叠的演讲中可靠地提取多个说话者身份。我们设计一个可以提取高级嵌入的网络,其中包含来自给定混合物的每个说话者身份的信息。与需要参考声学特征进行训练的传统方法不同,我们提出的算法仅需要重叠语音段的说话者身份标签。我们证明了算法在说话者验证任务中的有效性和实用性,以及通过拟议方法获得的目标扬声器嵌入的语音分离系统。
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system conditioned on the target speaker embeddings obtained through the proposed method.