论文标题
CMKD:CNN/基于变压器的音频分类的跨模型知识蒸馏
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
论文作者
论文摘要
音频分类是一个具有广泛应用的活跃研究领域。在过去的十年中,卷积神经网络(CNN)一直是端到端音频分类模型的事实上的标准构建块。最近,仅基于自我注意力的机制(例如音频谱图变压器(AST))的神经网络已显示出胜过CNN的表现。在本文中,我们发现两个截然不同的模型之间存在着有趣的相互作用-CNN和AST模型是彼此的好老师。当我们将他们中的任何一个用作老师并通过知识蒸馏(KD)培训另一个模型作为学生时,学生模型的表现会明显提高,并且在许多情况下,学生模型比教师模型更好。在我们使用此CNN/Transformer跨模型知识蒸馏(CMKD)方法的实验中,我们在FSD50K,Audioset和Esc-50上实现了新的最新性能。
Audio classification is an active research area with a wide range of applications. Over the past decade, convolutional neural networks (CNNs) have been the de-facto standard building block for end-to-end audio classification models. Recently, neural networks based solely on self-attention mechanisms such as the Audio Spectrogram Transformer (AST) have been shown to outperform CNNs. In this paper, we find an intriguing interaction between the two very different models - CNN and AST models are good teachers for each other. When we use either of them as the teacher and train the other model as the student via knowledge distillation (KD), the performance of the student model noticeably improves, and in many cases, is better than the teacher model. In our experiments with this CNN/Transformer Cross-Model Knowledge Distillation (CMKD) method we achieve new state-of-the-art performance on FSD50K, AudioSet, and ESC-50.