论文标题
FAAG:通过交互式攻击优化的快速对抗性音频生成
FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
论文作者
论文摘要
自动语音识别服务(ASRS)继承了深层神经网络的脆弱性,例如精心设计的对抗性例子。现有方法通常会遭受低效率的困扰,因为目标阶段被添加到整个音频样本中,从而导致对计算资源的需求很高。本文提出了一种名为FAAG的新颖方案,是一种基于迭代优化的方法,以快速生成目标的对抗示例。通过在音频的开始部分注入噪声,FAAG以高质量的高质量产生对抗性音频,并及时产生高质量。具体来说,我们使用音频逻辑输出将转录中的每个字符映射到音频框架的大致位置。因此,FAAG仅使用CPU在大约两分钟内,仅使用十秒钟,而一个GPU可以在大约两分钟内生成一个对抗性示例,同时保持平均成功率超过85%。具体而言,与对抗性示例生成过程中的基线方法相比,FAAG方法可以加快60%的速度。此外,我们发现,将良性音频附加到任何可疑的例子上都可以有效地防止针对目标的对抗性攻击。我们希望这项工作为通过计算限制的语音识别发明新的对抗性攻击铺平了道路。
Automatic Speech Recognition services (ASRs) inherit deep neural networks' vulnerabilities like crafted adversarial examples. Existing methods often suffer from low efficiency because the target phases are added to the entire audio sample, resulting in high demand for computational resources. This paper proposes a novel scheme named FAAG as an iterative optimization-based method to generate targeted adversarial examples quickly. By injecting the noise over the beginning part of the audio, FAAG generates adversarial audio in high quality with a high success rate timely. Specifically, we use audio's logits output to map each character in the transcription to an approximate position of the audio's frame. Thus, an adversarial example can be generated by FAAG in approximately two minutes using CPUs only and around ten seconds with one GPU while maintaining an average success rate over 85%. Specifically, the FAAG method can speed up around 60% compared with the baseline method during the adversarial example generation process. Furthermore, we found that appending benign audio to any suspicious examples can effectively defend against the targeted adversarial attack. We hope that this work paves the way for inventing new adversarial attacks against speech recognition with computational constraints.