论文标题
掩盖验证语言模型的有效替代方案
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
论文作者
论文摘要
我们提出了一种利用审慎的语言模型的有效方法,在该方法中,我们学习选择性的二进制口罩,以代替通过填充来修改它们。对贝特和罗伯塔在一系列NLP任务中的广泛评估表明,我们的掩蔽方案可产生与填充性相当的性能,但是当需要同时推断几个任务时,记忆足迹却小得多。通过固有的评估,我们表明,由蒙版语言模型计算的表示形式编码解决下游任务所需的信息。在分析损失格局时,我们表明掩盖和燃烧产生的模型位于最小值中,这些模型可以通过线段连接,并具有几乎恒定的测试精度。这证实掩蔽可以用作填充的有效替代方案。
We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.