与分类器无关的较低符号的对抗鲁棒性

论文标题

与分类器无关的较低符号的对抗鲁棒性

Classifier-independent Lower-Bounds for Adversarial Robustness

论文作者

Dohmatob, Elvis

论文摘要

我们从理论上分析了在分类中测试时间对抗和嘈杂示例的鲁棒性的限制。我们的工作着重于得出对给定问题均匀适用于所有分类器（即所有可测量功能）的界限（即所有可测量的功能）。我们的贡献是两个方面。（1）我们使用最佳传输理论来得出分类器在给定的分类问题上可以在给定的分类问题上造成的贝叶斯误差的变异公式，但要受到对抗攻击的影响。然后，最佳的对抗性攻击是特定攻击模型引起的某些二进制成本函数的最佳传输计划，并且可以通过基于两分图表上最大匹配的简单算法来计算。（2）在受欢迎的基于距离的攻击的情况下，我们在贝叶斯最佳误差上得出了明显的下限。这些界限是普遍的，因为它们取决于数据的类条件分布的几何形状，而不是特定分类器。我们的结果与现有文献形成鲜明对比，其中分类器的对抗脆弱性是由于非零普通测试误差而得出的。

We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification. Our work focuses on deriving bounds which uniformly apply to all classifiers (i.e all measurable functions from features to labels) for a given problem. Our contributions are two-fold. (1) We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem, subject to adversarial attacks. The optimal adversarial attack is then an optimal transport plan for a certain binary cost-function induced by the specific attack model, and can be computed via a simple algorithm based on maximal matching on bipartite graphs. (2) We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks. These bounds are universal in the sense that they depend on the geometry of the class-conditional distributions of the data, but not on a particular classifier. Our results are in sharp contrast with the existing literature, wherein adversarial vulnerability of classifiers is derived as a consequence of nonzero ordinary test error.

下载PDF全文

下载文献需遵守相关版权规定

论文标题