论文标题
从环境声音表示到对抗攻击的2D CNN模型的鲁棒性
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks
论文作者
论文摘要
本文研究了不同标准环境声音表示(频谱图)对受害者残留卷积神经网络的识别性能和对抗性攻击鲁棒性的影响,即RESNET-18。我们专注于这样的前端分类器而不是其他复杂体系结构的主要动机是平衡识别准确性和培训参数的总数。在此,我们测量了产生更有信息的MEL频率Cepstral系数(MFCC),短时傅立叶变换(STFT)和离散小波变换(DWT)表示所需的不同设置的影响。该测量涉及比较对抗性鲁棒性的分类性能。我们证明了识别准确性与模型鲁棒性与六个基准测试攻击算法之间的反比关系,这是对对手分配的平均预算的平衡与攻击成本的平均关系。此外,我们的实验结果表明,尽管在DWT频谱图上训练的RESNET-18模型达到了高识别精度,但对于对手来说,攻击该模型的成本比其他2D表示相对较高。我们还报告了有关不同卷积神经网络体系结构的一些结果,例如Resnet-34,Resnet-56,Alexnet和Googlenet,SB-CNN和基于LSTM。
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number of training parameters. Herein, we measure the impact of different settings required for generating more informative Mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. We demonstrate an inverse relationship between recognition accuracy and model robustness against six benchmarking attack algorithms on the balance of average budgets allocated by the adversary and the attack cost. Moreover, our experimental results have shown that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary than other 2D representations. We also report some results on different convolutional neural network architectures such as ResNet-34, ResNet-56, AlexNet, and GoogLeNet, SB-CNN, and LSTM-based.