使用高向量计算以进行有效的扬声器识别

论文标题

使用高向量计算以进行有效的扬声器识别

Computing with Hypervectors for Efficient Speaker Identification

论文作者

Huang, Ping-Chen, Kleyko, Denis, Rabaey, Jan M., Olshausen, Bruno A., Kanerva, Pentti

论文摘要

我们介绍了一种通过使用高维随机向量计算来识别说话者的方法。它的优势是简单性和速度。只有1.02k的活动参数和128分钟的通过训练数据，我们在1,251位扬声器的Voxceleb1数据集上获得了TOP-1和前5个分数，为31％和52％。这与CNN模型相反，CNN模型需要数百万个参数和数量级较高的计算复杂性，仅在共同信息中衡量的判别能力仅2 $ \ times $获得。额外的92秒训练和广义学习矢量量化（GLVQ）将分数提高到48％和67％。训练有素的分类器在5.7毫秒内分类1秒。所有处理均在基于标准的CPU机器上进行。

We introduce a method to identify speakers by computing with high-dimensional random vectors. Its strengths are simplicity and speed. With only 1.02k active parameters and a 128-minute pass through the training data we achieve Top-1 and Top-5 scores of 31% and 52% on the VoxCeleb1 dataset of 1,251 speakers. This is in contrast to CNN models requiring several million parameters and orders of magnitude higher computational complexity for only a 2$\times$ gain in discriminative power as measured in mutual information. An additional 92 seconds of training with Generalized Learning Vector Quantization (GLVQ) raises the scores to 48% and 67%. A trained classifier classifies 1 second of speech in 5.7 ms. All processing was done on standard CPU-based machines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题