模型宽度压缩的结构辍学

论文标题

模型宽度压缩的结构辍学

Structural Dropout for Model Width Compression

论文作者

Knodt, Julian

论文摘要

已知现有的ML模型已被高度过度分配，并且使用比给定任务所需的更多资源。先前的工作已经探索了脱机压缩模型，例如将知识从较大模型提炼成较小的模型。这对于压缩是有效的，但没有提供一种经验方法来测量可以压缩该模型的数量，并且需要为每个压缩模型进行额外的培训。我们提出了一种仅需要用于原始模型和一组压缩模型的培训会话的方法。所提出的方法是一种“结构性”辍学，将随机选择索引上方的隐藏状态中的所有元素修复，迫使该模型学习对其特征的重要性顺序。在学习此顺序后，可以在保留最准确性的同时修剪不重要的特征，从而大大降低参数大小。在这项工作中，我们专注于完全连接的层的结构辍学，但是该概念可以应用于具有无序特征的任何类型的层，例如卷积或注意力层。结构辍学不需要额外的修剪/再培训，但需要对每个可能的隐藏尺寸进行额外验证。在推理时，非专家可以选择最适合其需求的内存与精度取舍，在广泛的高度压缩模型与更准确的模型中。

Existing ML models are known to be highly over-parametrized, and use significantly more resources than required for a given task. Prior work has explored compressing models offline, such as by distilling knowledge from larger models into much smaller ones. This is effective for compression, but does not give an empirical method for measuring how much the model can be compressed, and requires additional training for each compressed model. We propose a method that requires only a single training session for the original model and a set of compressed models. The proposed approach is a "structural" dropout that prunes all elements in the hidden state above a randomly chosen index, forcing the model to learn an importance ordering over its features. After learning this ordering, at inference time unimportant features can be pruned while retaining most accuracy, reducing parameter size significantly. In this work, we focus on Structural Dropout for fully-connected layers, but the concept can be applied to any kind of layer with unordered features, such as convolutional or attention layers. Structural Dropout requires no additional pruning/retraining, but requires additional validation for each possible hidden sizes. At inference time, a non-expert can select a memory versus accuracy trade-off that best suits their needs, across a wide range of highly compressed versus more accurate models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题