ADMP：针对无监督跨域压缩的对抗性双口罩的修剪框架

论文标题

ADMP：针对无监督跨域压缩的对抗性双口罩的修剪框架

ADMP: An Adversarial Double Masks Based Pruning Framework For Unsupervised Cross-Domain Compression

论文作者

Feng, Xiaoyu, Yuan, Zhuqing, Wang, Guijin, Liu, Yongpan

论文摘要

尽管网络修剪最近取得了进展，但直接将其应用于物联网（IoT）应用程序仍然面临两个挑战，即端和云数据之间的分布差异以及端设备上的数据标签丢失。一种直接的解决方案是将无监督的域适应性（UDA）技术和修剪结合在一起。例如，该模型首先在云上修剪，然后通过UDA从云转移到结尾。但是，这种天真的组合面临高性能降解。因此，这项工作提出了基于对抗性双口罩的修剪（ADMP），以进行这种跨域压缩。在ADMP中，我们构建了一个知识蒸馏框架，不仅是为了产生伪标签，而且还提供了对域差异的测量，作为全尺寸教师和修剪的学生之间的输出差异。与现有的基于面具的修剪作品不同，ADMP采用了两个对抗性面具，即软性和硬面膜。因此，ADMP可以有效地修剪模型，同时仍允许模型提取强大的域不变特征和鲁棒分类边界。在训练过程中，交替的方向乘数方法用于克服{0,1} -masks的二进制约束。在Office31和ImageClef-DA数据集上，拟议的ADMP可以分别以60％的通道修剪平均准确性损失，平均准确性损失分别为0.2％和0.3％。与最先进的状态相比，我们可以降低约1.63倍的参数，准确性提高4.1％和5.1％。

Despite the recent progress of network pruning, directly applying it to the Internet of Things (IoT) applications still faces two challenges, i.e. the distribution divergence between end and cloud data and the missing of data label on end devices. One straightforward solution is to combine the unsupervised domain adaptation (UDA) technique and pruning. For example, the model is first pruned on the cloud and then transferred from cloud to end by UDA. However, such a naive combination faces high performance degradation. Hence this work proposes an Adversarial Double Masks based Pruning (ADMP) for such cross-domain compression. In ADMP, we construct a Knowledge Distillation framework not only to produce pseudo labels but also to provide a measurement of domain divergence as the output difference between the full-size teacher and the pruned student. Unlike existing mask-based pruning works, two adversarial masks, i.e. soft and hard masks, are adopted in ADMP. So ADMP can prune the model effectively while still allowing the model to extract strong domain-invariant features and robust classification boundaries. During training, the Alternating Direction Multiplier Method is used to overcome the binary constraint of {0,1}-masks. On Office31 and ImageCLEF-DA datasets, the proposed ADMP can prune 60% channels with only 0.2% and 0.3% average accuracy loss respectively. Compared with the state of art, we can achieve about 1.63x parameters reduction and 4.1% and 5.1% accuracy improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题