论文标题
脱毛:验证分类器的数据有效认证的鲁棒性
DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers
论文作者
论文摘要
使用随机平滑的认证防御是一种流行的技术,可为防止L2对抗攻击的深度神经网络提供鲁棒性保证。现有作品使用此技术通过在整个培训数据上训练定制的Denoiser网络来证明确保验证的非运动模型。但是,由于诸如高传输成本和数据的专有性质,访问训练集可能仅限于少数数据样本。因此,我们提出了一个新的问题,即“如何仅使用少数培训样本来证明验证模型的鲁棒性”。我们观察到,使用有限样本上的现有技术直接培训自定义DeNoiser会产生差的认证。为了克服这一点,我们提出的方法(DE-CROP)生成了与每个训练样本相对应的类别和插值样品,从而确保了预审前分类器的特征空间中的高度多样性。我们通过在分类器的logit空间中最大化生成样品的DeNoed输出与原始训练样本之间的相似性来训练Denoiser。我们还使用域歧视器和最大平均差异执行分配水平匹配,从而获得进一步的好处。在White Box设置中,我们在多个基准数据集上获得了基准的重大改进,并在具有挑战性的黑匣子设置下报告了类似的性能。
Certified defense using randomized smoothing is a popular technique to provide robustness guarantees for deep neural networks against l2 adversarial attacks. Existing works use this technique to provably secure a pretrained non-robust model by training a custom denoiser network on entire training data. However, access to the training set may be restricted to a handful of data samples due to constraints such as high transmission cost and the proprietary nature of the data. Thus, we formulate a novel problem of "how to certify the robustness of pretrained models using only a few training samples". We observe that training the custom denoiser directly using the existing techniques on limited samples yields poor certification. To overcome this, our proposed approach (DE-CROP) generates class-boundary and interpolated samples corresponding to each training sample, ensuring high diversity in the feature space of the pretrained classifier. We train the denoiser by maximizing the similarity between the denoised output of the generated sample and the original training sample in the classifier's logit space. We also perform distribution level matching using domain discriminator and maximum mean discrepancy that yields further benefit. In white box setup, we obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.