在音频标记的知识蒸馏中保留知识蒸馏的内在性相似性

论文标题

在音频标记的知识蒸馏中保留知识蒸馏的内在性相似性

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

论文作者

Chang, Chun-Chieh, Kao, Chieh-Chi, Sun, Ming, Wang, Chao

论文摘要

知识蒸馏（KD）是一个流行的研究领域，用于减少大型模型的规模，同时保持良好的性能。较大的教师模型的产出用于指导较小学生模型的培训。鉴于声学事件的重复性，我们建议利用此信息来调节KD培训进行音频标记。这种新颖的KD方法“保留KD”（IUSP）的“内在性相似性”显示了音频标记任务的有希望的结果。它是由先前发表的KD方法激励的：“保留KD”（SP）的“相似性”。但是，我们的方法没有保留在微型批次中的输入之间的成对相似性，而是保留了单个输入话语框架之间的成对相似性。我们提出的KD方法IUSP在DCASE 2019 Task 5 DataSet上用于音频标记的DCASE DASTION 5个不同大小的学生模型显示了SP的一致改进。相对于基线的改善，微AUPRC的改善比基线的改善增加了27.1％至122.4％。

Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetitive nature of acoustic events, we propose to leverage this information to regulate the KD training for Audio Tagging. This novel KD method, "Intra-Utterance Similarity Preserving KD" (IUSP), shows promising results for the audio tagging task. It is motivated by the previously published KD method: "Similarity Preserving KD" (SP). However, instead of preserving the pairwise similarities between inputs within a mini-batch, our method preserves the pairwise similarities between the frames of a single input utterance. Our proposed KD method, IUSP, shows consistent improvements over SP across student models of different sizes on the DCASE 2019 Task 5 dataset for audio tagging. There is a 27.1% to 122.4% percent increase in improvement of micro AUPRC over the baseline relative to SP's improvement of over the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题