论文标题

使用高向量计算以进行有效的扬声器识别

Computing with Hypervectors for Efficient Speaker Identification

论文作者

Huang, Ping-Chen, Kleyko, Denis, Rabaey, Jan M., Olshausen, Bruno A., Kanerva, Pentti

论文摘要

我们介绍了一种通过使用高维随机向量计算来识别说话者的方法。它的优势是简单性和速度。只有1.02k的活动参数和128分钟的通过训练数据,我们在1,251位扬声器的Voxceleb1数据集上获得了TOP-1和前5个分数,为31%和52%。这与CNN模型相反,CNN模型需要数百万个参数和数量级较高的计算复杂性,仅在共同信息中衡量的判别能力仅2 $ \ times $获得。额外的92秒训练和广义学习矢量量化(GLVQ)将分数提高到48%和67%。训练有素的分类器在5.7毫秒内分类1秒。所有处理均在基于标准的CPU机器上进行。

We introduce a method to identify speakers by computing with high-dimensional random vectors. Its strengths are simplicity and speed. With only 1.02k active parameters and a 128-minute pass through the training data we achieve Top-1 and Top-5 scores of 31% and 52% on the VoxCeleb1 dataset of 1,251 speakers. This is in contrast to CNN models requiring several million parameters and orders of magnitude higher computational complexity for only a 2$\times$ gain in discriminative power as measured in mutual information. An additional 92 seconds of training with Generalized Learning Vector Quantization (GLVQ) raises the scores to 48% and 67%. A trained classifier classifies 1 second of speech in 5.7 ms. All processing was done on standard CPU-based machines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源