论文标题
Adabert:与可区分的神经体系结构搜索的任务自适应BERT压缩
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
论文作者
论文摘要
大型的预训练的语言模型(例如伯特)在各种自然语言处理任务中都表明了它们的有效性。但是,巨大的参数大小使它们很难在需要使用有限资源的快速推断的实时应用程序中部署。现有方法将BERT压缩到小型模型中,而这种压缩是无关的,即,对于所有不同的下游任务,相同的压缩BERT。在以任务为导向的BERT压缩的必要性和好处的动机中,我们提出了一种新型的压缩方法Adabert,该方法利用可区分的神经体系结构搜索自动将BERT压缩为任务适应性的小型模型,以进行特定任务。我们结合了面向任务的知识蒸馏损失,以提供搜索提示和作为搜索限制的效率损失,这可以在效率和效率之间进行良好的权衡,以实现任务自适应的BERT压缩。我们在几个NLP任务上评估了Adabert,结果表明,在推理时间内,这些任务自适应压缩模型比BERT快于12.7倍至29.3倍,而在参数大小方面,那些任务自适应的压缩模型比BERT快11.5倍至17.0倍,而可比较性能则保持了可比性。
Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i.e., the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12.7x to 29.3x faster than BERT in inference time and 11.5x to 17.0x smaller in terms of parameter size, while comparable performance is maintained.