通过投影增强：进行有效，有效的数据增强范式进行蒸馏

论文标题

通过投影增强：进行有效，有效的数据增强范式进行蒸馏

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

论文作者

Wang, Ziqi, Wu, Yuexin, Liu, Frederick, Liu, Daogao, Hou, Le, Yu, Hongkun, Li, Jing, Ji, Heng

论文摘要

知识蒸馏是将知识从大型模型转移到小型模型的主要方法之一。但是，它需要大量特定于任务的数据，这在许多现实世界中可能是不合理的。应用数据增强方法，例如表示插值，代币置换或使用模型增强来解决此问题。但是，这些数据增强方法可能会导致决策边界的变化（表示插值），不够表达（代币更换），或者引入过多的计算开销（使用模型的增强）。为此，我们提出了AUGPRO（带有投影的增强），这是一种有效，有效的数据增强方法。我们的方法基于表示插值扩大方法，以维持表达式的多样性并将增强数据转换为代币，以避免改变决策范围。它使用了很少的计算开销带来的简单操作。多个胶水任务上的结果表明，我们的方法可以以较低的时间成本提高蒸馏性能。代码可在https://github.com/google-research/google-research/tree/master/augpro中找到。

Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive task-specific data, which may not be plausible in many real-world applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models). To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. Our method builds on top of representation interpolation augmentation methods to maintain the diversity of expressions and converts the augmented data to tokens to avoid shifting decision boundaries. It uses simple operations that come with little computational overhead. The results on multiple GLUE tasks show that our methods can improve distillation performance by a large margin at a low time cost. Codes are available at https://github.com/google-research/google-research/tree/master/augpro.

下载PDF全文

下载文献需遵守相关版权规定

论文标题