论文标题
深层模型重新组装
Deep Model Reassembly
论文作者
论文摘要
在本文中,我们探讨了一项新颖的知识转移任务,称为“深层模型”重新组装(Dery),用于通用模型重复使用。考虑到一系列来自不同来源和不同体系结构的异质模型的集合,顾名思义,Dery的目标是首先将每个模型剖析为独特的构件,然后选择性地重新组装衍生的块以在硬件资源和性能约束下同时产生定制的网络。这种雄心勃勃的特质不可避免地构成了重大挑战,包括首先是其解决方案的可行性。我们努力展示,通过本文提出的专门范式,不仅可以而且实际上可以有效地制作迪里。具体而言,我们通过封面集优化共同执行所有预训练网络的分区,并得出许多等价集,其中每个网络块都被视为功能等效性,因此可以互换。反过来,以这种方式学习的等效集可以使选择和组装块可以自定义受某些约束的网络,这是通过求解备份的整数程序来实现的,该程序备份了备份的无培训代理以估算任务绩效。重新组装模型,引起令人满意的表演,并满足用户指定的约束。我们证明,在ImageNet上,最佳的重新组装模型在不进行微调的情况下达到了78.6%的TOP-1准确性,通过端到端培训,可以将其进一步提高到83.2%。我们的代码可从https://github.com/adamdad/dery获得
In this paper, we explore a novel knowledge-transfer task, termed as Deep Model Reassembly (DeRy), for general-purpose model reuse. Given a collection of heterogeneous models pre-trained from distinct sources and with diverse architectures, the goal of DeRy, as its name implies, is to first dissect each model into distinctive building blocks, and then selectively reassemble the derived blocks to produce customized networks under both the hardware resource and performance constraints. Such ambitious nature of DeRy inevitably imposes significant challenges, including, in the first place, the feasibility of its solution. We strive to showcase that, through a dedicated paradigm proposed in this paper, DeRy can be made not only possibly but practically efficiently. Specifically, we conduct the partitions of all pre-trained networks jointly via a cover set optimization, and derive a number of equivalence set, within each of which the network blocks are treated as functionally equivalent and hence interchangeable. The equivalence sets learned in this way, in turn, enable picking and assembling blocks to customize networks subject to certain constraints, which is achieved via solving an integer program backed up with a training-free proxy to estimate the task performance. The reassembled models, give rise to gratifying performances with the user-specified constraints satisfied. We demonstrate that on ImageNet, the best reassemble model achieves 78.6% top-1 accuracy without fine-tuning, which could be further elevated to 83.2% with end-to-end training. Our code is available at https://github.com/Adamdad/DeRy