对抗性语义冲突

论文标题

对抗性语义冲突

Adversarial Semantic Collisions

论文作者

Song, Congzheng, Rush, Alexander M., Shmatikov, Vitaly

论文摘要

我们研究语义碰撞：语义上无关但通过NLP模型判断为相似的文本。我们开发了基于梯度的方法来产生语义碰撞，并证明了许多任务的最新模型依赖于分析文本的含义和相似性（包括释义识别，文档检索，响应建议和提取性摘要），这很容易受到语义碰撞。例如，给定目标查询，将精心设计的碰撞插入无关的文档可以将其检索排名从1000转移到前3名。我们展示了如何产生语义碰撞，以避免基于困惑的过滤并讨论其他潜在的缓解。我们的代码可在https://github.com/csong27/collision-bert上找到。

We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions and demonstrate that state-of-the-art models for many tasks which rely on analyzing the meaning and similarity of texts-- including paraphrase identification, document retrieval, response suggestion, and extractive summarization-- are vulnerable to semantic collisions. For example, given a target query, inserting a crafted collision into an irrelevant document can shift its retrieval rank from 1000 to top 3. We show how to generate semantic collisions that evade perplexity-based filtering and discuss other potential mitigations. Our code is available at https://github.com/csong27/collision-bert.

下载PDF全文

下载文献需遵守相关版权规定

论文标题