论文标题
模仿者:一种追溯为NLP型号添加弹性的方法
MockingBERT: A Method for Retroactively Adding Resilience to NLP Models
论文作者
论文摘要
在过去几年中,保护NLP模型免受拼写错误的障碍一直是研究兴趣的对象。现有的补救措施通常会损害准确性,或者需要对每个新的攻击类别进行全面模型进行重新训练。我们提出了一种新颖的方法,可以向基于变压器的NLP模型中的拼写错误增加弹性。可以实现这种鲁棒性,而无需重新训练原始的NLP模型,并且只有最小的语言丧失理解在没有拼写错误的输入上的性能。此外,我们提出了一种产生对抗性拼写错误的新有效近似方法,该方法大大降低了评估模型对对抗性攻击的弹性所需的成本。
Protecting NLP models against misspellings whether accidental or adversarial has been the object of research interest for the past few years. Existing remediations have typically either compromised accuracy or required full model re-training with each new class of attacks. We propose a novel method of retroactively adding resilience to misspellings to transformer-based NLP models. This robustness can be achieved without the need for re-training of the original NLP model and with only a minimal loss of language understanding performance on inputs without misspellings. Additionally we propose a new efficient approximate method of generating adversarial misspellings, which significantly reduces the cost needed to evaluate a model's resilience to adversarial attacks.