论文标题
学习使用用户级差异隐私生成图像嵌入
Learning to Generate Image Embeddings with User-level Differential Privacy
论文作者
论文摘要
过去,小型的设备模型已通过用户级别的差异隐私(DP)成功培训,用于下一个单词预测和图像分类任务。但是,现有方法直接应用于使用具有较大集体空间的监督培训数据来学习嵌入模型。为了实现用于大型图像到装饰特征提取器的用户级DP,我们提出了DP-FEDEMB,DP-FEDEMB是具有每个用户灵敏度控制和噪声的联合学习算法的变体,以从数据中心集中的用户分区数据中训练。 DP-FEDEMB结合了虚拟客户,部分聚合,私人本地微调和公众预处理,以实现强大的隐私公用事业权衡。我们将DP-FEDEMB应用于面孔,地标和自然物种的图像嵌入模型,并在基准数据集Digiface,Emnist,GLD和Inaturalist上在相同的隐私预算下展示其优越的实用性。我们进一步说明,当数百万用户可以参与培训时,可以在5%以内控制$ε<4 $的强大用户级DP保证。
Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past. However, existing methods can fail when directly applied to learn embedding models using supervised training data with a large class space. To achieve user-level DP for large image-to-embedding feature extractors, we propose DP-FedEmb, a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in the datacenter. DP-FedEmb combines virtual clients, partial aggregation, private local fine-tuning, and public pretraining to achieve strong privacy utility trade-offs. We apply DP-FedEmb to train image embedding models for faces, landmarks and natural species, and demonstrate its superior utility under same privacy budget on benchmark datasets DigiFace, EMNIST, GLD and iNaturalist. We further illustrate it is possible to achieve strong user-level DP guarantees of $ε<4$ while controlling the utility drop within 5%, when millions of users can participate in training.