论文标题
探索基于迅速的学习范式的普遍脆弱性
Exploring the Universal Vulnerability of Prompt-based Learning Paradigm
论文作者
论文摘要
基于及时的学习范式弥合了预训练和微调之间的差距,并在几次射击设置下有效地工作。但是,我们发现这种学习范式从训练阶段继承了漏洞,在该阶段可以通过将某些触发器插入文本中误导模型预测。在本文中,我们通过注射后门触发器或仅使用纯文本在预训练的语言模型上搜索对抗性触发器来探讨这种通用漏洞。在这两种情况下,我们都证明了我们的触发器可以完全控制或严重降低对任意下游任务进行微调的基于及时的模型的性能,这反映了基于迅速的学习范式的普遍脆弱性。进一步的实验表明,对抗触发器在语言模型之间具有良好的可传递性。我们还发现,传统的微调模型不容易受到预先训练的语言模型构建的对抗触发器的影响。我们通过提出潜在解决方案来减轻我们的攻击方法来得出结论。代码和数据可在https://github.com/leix28/prompt-universal-vulnerability上公开获取
Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting. However, we find that this learning paradigm inherits the vulnerability from the pre-training stage, where model predictions can be misled by inserting certain triggers into the text. In this paper, we explore this universal vulnerability by either injecting backdoor triggers or searching for adversarial triggers on pre-trained language models using only plain text. In both scenarios, we demonstrate that our triggers can totally control or severely decrease the performance of prompt-based models fine-tuned on arbitrary downstream tasks, reflecting the universal vulnerability of the prompt-based learning paradigm. Further experiments show that adversarial triggers have good transferability among language models. We also find conventional fine-tuning models are not vulnerable to adversarial triggers constructed from pre-trained language models. We conclude by proposing a potential solution to mitigate our attack methods. Code and data are publicly available at https://github.com/leix28/prompt-universal-vulnerability