参数效率屏蔽网络

论文标题

参数效率屏蔽网络

Parameter-Efficient Masking Networks

论文作者

Bai, Yue, Wang, Huan, Ma, Xu, Zhang, Yitian, Tao, Zhiqiang, Fu, Yun

论文摘要

更深层次的网络结构通常会处理更复杂的非线性性，并且更具竞争性。如今，高级网络设计通常包含大量重复结构（例如，变压器）。他们将网络容量提高到新的水平，但也不可避免地增加了模型大小，这对模型还原或传输是不友好的。在这项研究中，我们是第一个通过学习多种掩码并介绍参数效率高效掩蔽网络（PEMN）来研究具有有限唯一值的固定随机权重的代表性潜力的人。它也自然会导致新的范式用于模型压缩以减小模型大小。具体的，是由现代神经网络中重复结构的激励，我们利用一个随机初始化层，伴随着不同的遮罩，以传达不同的功能映射并表示重复的网络模块。因此，该模型可以用一堆口罩表示为\ textit {一层}，从而大大降低了模型存储成本。此外，我们通过学习通过填充给定随机权重矢量填充的模型来增强策略。这样，我们的方法可以进一步降低空间复杂性，尤其是对于没有许多重复体系结构的模型。我们验证了对独特价值有限的随机权重上的佩姆学习面具的潜力，并基于不同的网络体系结构测试了其对新的压缩范式的有效性。代码可从https://github.com/yueb17/pemn获得

A deeper network structure generally handles more complicated non-linearity and performs more competitively. Nowadays, advanced network designs often contain a large number of repetitive structures (e.g., Transformer). They empower the network capacity to a new level but also increase the model size inevitably, which is unfriendly to either model restoring or transferring. In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning diverse masks and introduce the Parameter-Efficient Masking Networks (PEMN). It also naturally leads to a new paradigm for model compression to diminish the model size. Concretely, motivated by the repetitive structures in modern neural networks, we utilize one random initialized layer, accompanied with different masks, to convey different feature mappings and represent repetitive network modules. Therefore, the model can be expressed as \textit{one-layer} with a bunch of masks, which significantly reduce the model storage cost. Furthermore, we enhance our strategy by learning masks for a model filled by padding a given random weights vector. In this way, our method can further lower the space complexity, especially for models without many repetitive architectures. We validate the potential of PEMN learning masks on random weights with limited unique values and test its effectiveness for a new compression paradigm based on different network architectures. Code is available at https://github.com/yueb17/PEMN

下载PDF全文

下载文献需遵守相关版权规定

论文标题