论文标题
Syfer:私人数据发布的神经混淆
Syfer: Neural Obfuscation for Private Data Release
论文作者
论文摘要
平衡隐私和预测效用仍然是医疗保健机器学习的核心挑战。在本文中,我们开发了一种神经混淆方法,以防止重新识别攻击。 Syfer与随机神经网络组成训练的层,以编码原始数据(例如X射线),同时保持从编码数据中预测诊断的能力。编码器中的随机性充当数据所有者的私钥。我们将隐私量化为重新识别单个图像所需的攻击者猜测数量(猜测)。我们提出了一种对比度学习算法来估计猜测。我们从经验上表明,诸如DP图像之类的差异私人方法在大量效用中获得隐私。相比之下,Syfer在保留公用事业的同时获得了强大的隐私。例如,X射线分类器以DP形象,Syfer和原始数据构建,平均AUC分别为0.53、0.78和0.86。
Balancing privacy and predictive utility remains a central challenge for machine learning in healthcare. In this paper, we develop Syfer, a neural obfuscation method to protect against re-identification attacks. Syfer composes trained layers with random neural networks to encode the original data (e.g. X-rays) while maintaining the ability to predict diagnoses from the encoded data. The randomness in the encoder acts as the private key for the data owner. We quantify privacy as the number of attacker guesses required to re-identify a single image (guesswork). We propose a contrastive learning algorithm to estimate guesswork. We show empirically that differentially private methods, such as DP-Image, obtain privacy at a significant loss of utility. In contrast, Syfer achieves strong privacy while preserving utility. For example, X-ray classifiers built with DP-image, Syfer, and original data achieve average AUCs of 0.53, 0.78, and 0.86, respectively.