基于GLRT的对抗性稳健分类

论文标题

基于GLRT的对抗性稳健分类

Adversarially Robust Classification based on GLRT

论文作者

Puranik, Bhagyashree, Madhow, Upamanyu, Pedarsani, Ramtin

论文摘要

机器学习模型容易受到对抗性攻击的影响，这些攻击通常会通过引入小但设计精良的扰动来引起错误分类。在本文中，我们探讨了经典综合假设检验的设置，这是一种基于广义似然比测试（GLRT）的防御策略，该策略共同估计了兴趣类别和对抗性扰动。 We evaluate the GLRT approach for the special case of binary hypothesis testing in white Gaussian noise under $\ell_{\infty}$ norm-bounded adversarial perturbations, a setting for which a minimax strategy optimizing for the worst-case attack is known.我们表明，GLRT方法在最差的攻击下与Minimax方法的性能具有竞争力，并观察到在较弱的攻击下，它在较弱的攻击下产生了更好的鲁棒性 - 临界折衷，具体取决于信号组件相对于攻击预算的值。我们还观察到，GLRT防御自然而然地概括为更复杂的模型，最佳的最小分类器尚不清楚。

Machine learning models are vulnerable to adversarial attacks that can often cause misclassification by introducing small but well designed perturbations. In this paper, we explore, in the setting of classical composite hypothesis testing, a defense strategy based on the generalized likelihood ratio test (GLRT), which jointly estimates the class of interest and the adversarial perturbation. We evaluate the GLRT approach for the special case of binary hypothesis testing in white Gaussian noise under $\ell_{\infty}$ norm-bounded adversarial perturbations, a setting for which a minimax strategy optimizing for the worst-case attack is known. We show that the GLRT approach yields performance competitive with that of the minimax approach under the worst-case attack, and observe that it yields a better robustness-accuracy trade-off under weaker attacks, depending on the values of signal components relative to the attack budget. We also observe that the GLRT defense generalizes naturally to more complex models for which optimal minimax classifiers are not known.

下载PDF全文

下载文献需遵守相关版权规定

论文标题