BERT关于数据饮食：通过基于梯度的修剪来找到重要的例子

论文标题

BERT关于数据饮食：通过基于梯度的修剪来找到重要的例子

BERT on a Data Diet: Finding Important Examples by Gradient-Based Pruning

论文作者

Fayyaz, Mohsen, Aghazadeh, Ehsan, Modarressi, Ali, Pilehvar, Mohammad Taher, Yaghoobzadeh, Yadollah, Kahou, Samira Ebrahimi

论文摘要

当前的预训练的语言模型依靠大型数据集来实现最先进的性能。但是，过去的研究表明，在培训期间，数据集中并非所有示例都同样重要。实际上，有时可以在维持测试性能的同时修剪训练集的相当一部分。基于标准视觉基准测试，两个基于梯度的评分指标用于查找重要示例，及其估计版本EL2N。在这项工作中，我们首次在NLP中采用这两个指标。我们证明，这些指标需要在至少一个微调时期之后计算，并且在早期步骤中并不可靠。此外，我们表明，通过修剪具有最高Grand/EL2N分数的示例的一小部分，我们不仅可以保留测试准确性，而且可以超过测试的准确性。本文详细介绍了使Grand和EL2N适用于NLP的调整和实施选择。

Current pre-trained language models rely on large datasets for achieving state-of-the-art performance. However, past research has shown that not all examples in a dataset are equally important during training. In fact, it is sometimes possible to prune a considerable fraction of the training set while maintaining the test performance. Established on standard vision benchmarks, two gradient-based scoring metrics for finding important examples are GraNd and its estimated version, EL2N. In this work, we employ these two metrics for the first time in NLP. We demonstrate that these metrics need to be computed after at least one epoch of fine-tuning and they are not reliable in early steps. Furthermore, we show that by pruning a small portion of the examples with the highest GraNd/EL2N scores, we can not only preserve the test accuracy, but also surpass it. This paper details adjustments and implementation choices which enable GraNd and EL2N to be applied to NLP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题