论文标题
BADRES:通过残留连接揭示后门
BadRes: Reveal the Backdoors through Residual Connection
论文作者
论文摘要
通常,剩余连接是在CNN和变形金刚构建CNN和变压器中必不可少的网络组件,用于CV和VL中的各种下游任务,这鼓励了网络块之间跳过快捷方式。但是,逐层环回剩余连接也可能会通过允许毫无戒心的输入来损害模型的鲁棒性。在本文中,我们提出了一种简单而强大的后门攻击方法 - BADRES,残留的连接作为一个旋转门,可以确定清洁输入,而对中有毒的输入则无法预测。我们已经在具有VIT和BEIT模型的四个数据集上进行了经验评估,而BadRES在清洁数据上获得了零性能降解,获得了97%的攻击成功率。此外,我们通过最先进的防御方法分析了Badres,并揭示了剩余联系中的根本弱点。
Generally, residual connections are indispensable network components in building CNNs and Transformers for various downstream tasks in CV and VL, which encourages skip shortcuts between network blocks. However, the layer-by-layer loopback residual connections may also hurt the model's robustness by allowing unsuspecting input. In this paper, we proposed a simple yet strong backdoor attack method - BadRes, where the residual connections play as a turnstile to be deterministic on clean inputs while unpredictable on poisoned ones. We have performed empirical evaluations on four datasets with ViT and BEiT models, and the BadRes achieves 97% attack success rate while receiving zero performance degradation on clean data. Moreover, we analyze BadRes with state-of-the-art defense methods and reveal the fundamental weakness lying in residual connections.