论文标题

站在巨人的肩膀上:硬件和神经架构共同搜索热门

Standing on the Shoulders of Giants: Hardware and Neural Architecture Co-Search with Hot Start

论文作者

Jiang, Weiwen, Yang, Lei, Dasgupta, Sakyasingha, Hu, Jingtong, Shi, Yiyu

论文摘要

硬件和神经架构共同搜索从给定数据集中自动生成人工智能(AI)解决方案的共同搜索有望促进AI民主化;但是,当前的共同搜索框架所需的时间是一个目标硬件的数百小时。这抑制了在商品硬件上使用此类框架。现有共同搜索框架中效率低的根本原因是它们从“冷”状态(即从头开始搜索)开始。在本文中,我们提出了一个新颖的框架,即Hotnas,该框架从基于一组现有的预训练模型(又称模型动物园)的“热”状态开始,以避免冗长的训练时间。因此,搜索时间可以从200 GPU小时减少到少于3 GPU小时。在HOTNA中,除了硬件设计空间和神经体系结构搜索空间外,我们还集成了一个压缩空间,以在共同搜索过程中进行模型压缩,从而创造了新的机会来减少潜伏期,但也带来了挑战。关键的挑战之一是,上述所有搜索空间彼此耦合,例如,如果没有硬件设计支持,压缩可能无法正常工作。为了解决此问题,Hotnas建立了一系列工具来设计硬件以支持压缩,以此为基础,开发了全局优化器,以自动为所有相关的搜索空间进行自动搜索。 Imagenet数据集和Xilinx FPGA的实验表明,在5ms的时间限制内,HOTNAS产生的神经体系结构可实现高达5.79%的TOP-1和3.97%的TOP-5准确性增长,而与现有的架构相比。

Hardware and neural architecture co-search that automatically generates Artificial Intelligence (AI) solutions from a given dataset is promising to promote AI democratization; however, the amount of time that is required by current co-search frameworks is in the order of hundreds of GPU hours for one target hardware. This inhibits the use of such frameworks on commodity hardware. The root cause of the low efficiency in existing co-search frameworks is the fact that they start from a "cold" state (i.e., search from scratch). In this paper, we propose a novel framework, namely HotNAS, that starts from a "hot" state based on a set of existing pre-trained models (a.k.a. model zoo) to avoid lengthy training time. As such, the search time can be reduced from 200 GPU hours to less than 3 GPU hours. In HotNAS, in addition to hardware design space and neural architecture search space, we further integrate a compression space to conduct model compressing during the co-search, which creates new opportunities to reduce latency but also brings challenges. One of the key challenges is that all of the above search spaces are coupled with each other, e.g., compression may not work without hardware design support. To tackle this issue, HotNAS builds a chain of tools to design hardware to support compression, based on which a global optimizer is developed to automatically co-search all the involved search spaces. Experiments on ImageNet dataset and Xilinx FPGA show that, within the timing constraint of 5ms, neural architectures generated by HotNAS can achieve up to 5.79% Top-1 and 3.97% Top-5 accuracy gain, compared with the existing ones.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源