论文标题
通过梯子网络改善扬声器验证的嵌入提取
Improving Embedding Extraction for Speaker Verification with Ladder Network
论文作者
论文摘要
演讲者验证是语音处理和非常充满活力的研究领域的一项确定但具有挑战性的任务。最新的扬声器验证(SV)系统依靠深度神经网络来提取能够表征用户声音的高级嵌入。大多数研究都研究了改善网络的可区分性,以提取更好的嵌入以改善性能。但是,只有很少的研究专注于改善概括。在本文中,我们建议在SV系统中应用阶梯网络框架,该阶梯网络结合了被监督和无监督的学习时尚。梯子网络可以通过平衡权衡取舍以保持/丢弃尽可能多的有用/无用的信息来使系统具有更好的高级嵌入。我们评估了两个最先进的SV系统D-vector和X-vector上的框架,可用于不同的用例。实验表明,所提出的方法相对最多提高了10%的性能,而无需添加参数和增强数据。
Speaker verification is an established yet challenging task in speech processing and a very vibrant research area. Recent speaker verification (SV) systems rely on deep neural networks to extract high-level embeddings which are able to characterize the users' voices. Most of the studies have investigated on improving the discriminability of the networks to extract better embeddings for performances improvement. However, only few research focus on improving the generalization. In this paper, we propose to apply the ladder network framework in the SV systems, which combines the supervised and unsupervised learning fashions. The ladder network can make the system to have better high-level embedding by balancing the trade-off to keep/discard as much useful/useless information as possible. We evaluated the framework on two state-of-the-art SV systems, d-vector and x-vector, which can be used for different use cases. The experiments showed that the proposed approach relatively improved the performance by 10% at most without adding parameters and augmented data.