通过自我监督预处理改善标签缺陷的关键字发现

论文标题

通过自我监督预处理改善标签缺陷的关键字发现

Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining

论文作者

Bovbjerg, Holger Severin, Tan, Zheng-Hua

论文摘要

关键字斑点（KWS）模型越来越多地集成到各种系统中，例如语音助手。为了达到令人满意的性能，这些模型通常依赖大量的标记数据，将其应用程序仅限于可用数据的情况。自我监督学习（SSL）方法可以通过利用易于获取的无标记数据来减轻这种依赖。大多数SSL语音方法主要是针对大型模型研究的，而这并不理想，因为通常需要紧凑的KWS模型。本文探讨了SSL对KWS小型模型的有效性，并确定SSL可以在稀缺标记的数据时可以增强小型KWS模型的性能。我们使用data2vec预算了三个基于紧凑型变压器的KWS模型，然后在Google语音命令数据集的标签缺陷设置上微调它们。发现Data2VEC预处理导致准确性显着提高，缺乏标签的场景显示出8.22％11.18％的绝对精度。

Keyword Spotting (KWS) models are becoming increasingly integrated into various systems, e.g. voice assistants. To achieve satisfactory performance, these models typically rely on a large amount of labelled data, limiting their applications only to situations where such data is available. Self-supervised Learning (SSL) methods can mitigate such a reliance by leveraging readily-available unlabelled data. Most SSL methods for speech have primarily been studied for large models, whereas this is not ideal, as compact KWS models are generally required. This paper explores the effectiveness of SSL on small models for KWS and establishes that SSL can enhance the performance of small KWS models when labelled data is scarce. We pretrain three compact transformer-based KWS models using Data2Vec, and fine-tune them on a label-deficient setup of the Google Speech Commands data set. It is found that Data2Vec pretraining leads to a significant increase in accuracy, with label-deficient scenarios showing an improvement of 8.22% 11.18% absolute accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题