论文标题
部分可观测时空混沌系统的无模型预测
Reconstructing Training Data from Trained Neural Networks
论文作者
论文摘要
了解神经网络在多大程度上记住培训数据是一个有趣的问题,具有实践和理论的含义。在本文中,我们表明,在某些情况下,实际上可以从训练有素的神经网络分类器的参数中重建训练数据的很大一部分。我们提出了一种新颖的重建方案,该方案源于有关以基于梯度的方法训练神经网络的隐式偏见的最新理论结果。据我们所知,我们的结果是第一个表明从训练有素的神经网络分类器中重建实际训练样本的大部分培训样本的结果。这对隐私有负面影响,因为它可以用作揭示敏感训练数据的攻击。我们在一些标准的计算机视觉数据集上演示了二进制MLP分类器的方法。
Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods. To the best of our knowledge, our results are the first to show that reconstructing a large portion of the actual training samples from a trained neural network classifier is generally possible. This has negative implications on privacy, as it can be used as an attack for revealing sensitive training data. We demonstrate our method for binary MLP classifiers on a few standard computer vision datasets.