揭示预先训练模型的秘密

论文标题

揭示预先训练模型的秘密

Revealing Secrets From Pre-trained Models

论文作者

Rafi, Mujahid Al, Feng, Yuan, Jeon, Hyeran

论文摘要

随着培训深度学习模型的负担越来越大，具有大量数据集，在许多新兴的深度学习算法中，转移学习已被广泛采用。诸如BERT之类的变压器模型是自然语言处理的主要参与者，并将转移学习作为事实上的标准培训方法。一些大数据公司发布了经过培训的预培训模型，这些模型已通过一些流行的数据集进行了培训，最终用户和研究人员使用自己的数据集对模型进行了微调。转移学习大大减少了培训模型的时间和精力。但是，这是以安全问题为代价的。在本文中，我们表明了一个新的观察结果，即预先训练的模型和微调模型在权重值上具有很高的相似性。另外，我们证明即使对于同一模型，也存在特定于供应商的计算模式。通过这些新发现，我们提出了一种新的模型提取攻击，该攻击揭示了模型架构以及具有特定于供应商的计算模式的Black-Box受害者模型使用的预培训模型，然后根据微型模型和预培养的模型之间的重量值相似性来估算整个模型权重。我们还表明，可以利用重量相似性来通过新的重量提取修剪来提高模型提取可行性。

With the growing burden of training deep learning models with large data sets, transfer-learning has been widely adopted in many emerging deep learning algorithms. Transformer models such as BERT are the main player in natural language processing and use transfer-learning as a de facto standard training method. A few big data companies release pre-trained models that are trained with a few popular datasets with which end users and researchers fine-tune the model with their own datasets. Transfer-learning significantly reduces the time and effort of training models. However, it comes at the cost of security concerns. In this paper, we show a new observation that pre-trained models and fine-tuned models have significantly high similarities in weight values. Also, we demonstrate that there exist vendor-specific computing patterns even for the same models. With these new findings, we propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model with vendor-specific computing patterns and then estimates the entire model weights based on the weight value similarities between the fine-tuned model and pre-trained model. We also show that the weight similarity can be leveraged for increasing the model extraction feasibility through a novel weight extraction pruning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题