通过不确定性定量检测对抗性示例以进行语音识别

论文标题

通过不确定性定量检测对抗性示例以进行语音识别

Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification

论文作者

Däubener, Sina, Schönherr, Lea, Fischer, Asja, Kolossa, Dorothea

论文摘要

机器学习系统以及特别是自动语音识别（ASR）系统对对抗性攻击非常容易受到伤害，在这种攻击中，攻击者恶意改变了输入。在ASR系统的情况下，最有趣的情况是有针对性的攻击，其中攻击者旨在迫使系统识别任意音频样本中给定的目标抄录。越来越多的复杂，准易于察觉的攻击引发了对策的问题。在本文中，我们专注于混合ASR系统，并比较了四个声学模型，以指示其在攻击下表明不确定性的能力：馈送前向神经网络和三个专门设计用于不确定性定量的神经网络，即贝叶斯神经网络，蒙特卡洛辍学和深度综合。我们采用声学模型的不确定性度量来构建一个简单的单级分类模型，以评估输入是良性还是对抗性。基于这种方法，我们能够检测到接收操作员曲线评分下的面积超过0.99的对抗示例。不确定性定量的神经网络同时减少了攻击的脆弱性，与标准混合ASR系统相比，恶意目标文本的较低识别精度反映了这一脆弱性。

Machine learning systems and also, specifically, automatic speech recognition (ASR) systems are vulnerable against adversarial attacks, where an attacker maliciously changes the input. In the case of ASR systems, the most interesting cases are targeted attacks, in which an attacker aims to force the system into recognizing given target transcriptions in an arbitrary audio sample. The increasing number of sophisticated, quasi imperceptible attacks raises the question of countermeasures. In this paper, we focus on hybrid ASR systems and compare four acoustic models regarding their ability to indicate uncertainty under attack: a feed-forward neural network and three neural networks specifically designed for uncertainty quantification, namely a Bayesian neural network, Monte Carlo dropout, and a deep ensemble. We employ uncertainty measures of the acoustic model to construct a simple one-class classification model for assessing whether inputs are benign or adversarial. Based on this approach, we are able to detect adversarial examples with an area under the receiving operator curve score of more than 0.99. The neural networks for uncertainty quantification simultaneously diminish the vulnerability to the attack, which is reflected in a lower recognition accuracy of the malicious target text in comparison to a standard hybrid ASR system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题