帕尔伯特：教艾伯特对

论文标题

帕尔伯特：教艾伯特对

PALBERT: Teaching ALBERT to Ponder

论文作者

Balagansky, Nikita, Gavrilov, Daniil

论文摘要

当前，预训练的模型可以被视为多种NLP任务的默认选择。尽管有SOTA结果，但有实际证据表明，这些模型可能需要不同数量的计算层来进行不同的输入序列，因为评估所有层都会导致错误的预测过度自信（即过度思考）。可以通过实施自适应计算时间方法来解决此问题，该方法首先旨在提高推理速度。最近提出的Pondernet可能是通过将出口层的指数视为潜在变量来执行早期出口的有前途解决方案。但是，最初提出的退出标准依赖于从经过训练的后验分布中取样的可能性，这是从$ i $ the层退出的可能性，引入了出口层指数的主要差异，从而大大降低了所得模型的性能。在本文中，我们建议使用新颖的确定性Q-Exit标准和重新审视的模型体系结构提出改进的蓬松网。我们将提出的机制调整为阿尔伯特和罗伯塔，并将其与最近进行早期出口的方法进行了比较。我们观察到，在广泛的胶水任务上，可以将所提出的更改视为对原始蓬松网络体系结构的重大改进，并且胜过Pabee。此外，我们还对拟议的体系结构进行了深入的消融研究，以进一步了解Lambda层及其性能。

Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different input sequences, since evaluating all layers leads to overconfidence in wrong predictions (namely overthinking). This problem can potentially be solved by implementing adaptive computation time approaches, which were first designed to improve inference speed. Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layer's index as a latent variable. However, the originally proposed exit criterion, relying on sampling from trained posterior distribution on the probability of exiting from the $i$-th layer, introduces major variance in exit layer indices, significantly reducing the resulting model's performance. In this paper, we propose improving PonderNet with a novel deterministic Q-exit criterion and a revisited model architecture. We adapted the proposed mechanism to ALBERT and RoBERTa and compared it with recent methods for performing an early exit. We observed that the proposed changes can be considered significant improvements on the original PonderNet architecture and outperform PABEE on a wide range of GLUE tasks. In addition, we also performed an in-depth ablation study of the proposed architecture to further understand Lambda layers and their performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题