论文标题
地层:代码模型的简单无梯度攻击
STRATA: Simple, Gradient-Free Attacks for Models of Code
论文作者
论文摘要
神经网络众所周知,在输入中易受不可察觉的扰动,称为对抗性示例,导致错误分类。与图像和自然语言的域相比,生成源代码的对抗示例提出了一个额外的挑战,因为源代码扰动必须保留代码的功能含义。我们确定了令牌频率统计和学习的令牌嵌入之间的惊人关系:除了最高的弗拉克克语令牌外,学到的令牌嵌入的L2标准随代币的频率而增加。我们利用这种关系来构建一种简单有效的无梯度方法,用于在代码模型上生成最新的对抗示例。我们的方法从经验上优于基于梯度的方法,而较少的信息和计算工作较少。
Neural networks are well-known to be vulnerable to imperceptible perturbations in the input, called adversarial examples, that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must retain the functional meaning of the code. We identify a striking relationship between token frequency statistics and learned token embeddings: the L2 norm of learned token embeddings increases with the frequency of the token except for the highest-frequnecy tokens. We leverage this relationship to construct a simple and efficient gradient-free method for generating state-of-the-art adversarial examples on models of code. Our method empirically outperforms competing gradient-based methods with less information and less computational effort.