论文标题

野生模式重新加载:针对培训数据中毒的机器学习安全性调查

Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning

论文作者

Cinà, Antonio Emanuele, Grosse, Kathrin, Demontis, Ambra, Vascon, Sebastiano, Zellinger, Werner, Moser, Bernhard A., Oprea, Alina, Biggio, Battista, Pelillo, Marcello, Roli, Fabio

论文摘要

机器学习的成功是由于计算能力和大型培训数据集的可用性的增加而推动了。培训数据用于学习新模型或更新现有模型,假设它足以代表在测试时遇到的数据。这种假设受到中毒威胁的挑战,这种攻击操纵训练数据以损害模型在测试时的表现。尽管中毒已被认为是行业应用中的相关威胁,并且到目前为止已经提出了各种不同的攻击和防御,但对该领域的完整系统化和批判性审查仍缺失。在这项调查中,我们在机器学习中提供了中毒攻击和防御措施的全面系统化,审查了过去15年中该领域发表的100多篇论文。我们首先对当前的威胁模型和攻击进行分类,然后相应地组织现有的防御。虽然我们主要关注计算机视觉应用程序,但我们认为我们的系统化还包括其他数据模式的最新攻击和防御。最后,我们讨论了中毒研究的现有资源,并阐明了当前的局限性和在该研究领域的开放研究问题。

The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源