论文标题
查找快速变压器:通过组件组成的单发神经架构搜索
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
论文作者
论文摘要
基于变压器的模型已经实现了天然语言处理的许多任务。但是,这种模型通常在推理时间很慢,从而使部署变得困难。在本文中,我们开发了一种有效的算法来搜索快速模型,同时保持模型质量。我们描述了一种新颖的方法将变压器架构分解为较小的组件,并提出了一种基于抽样的一击架构搜索方法,以找到用于推理的最佳模型。模型搜索过程比替代方案更有效,仅在训练时间内增加了一个小的开销。通过将我们的方法应用于Bert-base体系结构,我们在先前最先进的TPU-V2上先前最先进的蒸发BERT模型上实现了10%至30%的速度,而速度则达到了70%的速度,并且性能通常可以接受。
Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient algorithm to search for fast models while maintaining model quality. We describe a novel approach to decompose the Transformer architecture into smaller components, and propose a sampling-based one-shot architecture search method to find an optimal model for inference. The model search process is more efficient than alternatives, adding only a small overhead to training time. By applying our methods to BERT-base architectures, we achieve 10% to 30% speedup for pre-trained BERT and 70% speedup on top of a previous state-of-the-art distilled BERT model on Cloud TPU-v2 with a generally acceptable drop in performance.