论文标题
hapi:硬件感知的渐进推理
HAPI: Hardware-Aware Progressive Inference
论文作者
论文摘要
卷积神经网络(CNN)最近已成为各种AI任务的最先进。尽管它们很受欢迎,但CNN推断仍然以高计算成本呈现出来。越来越多的工作旨在通过利用样品中的分类难度和在网络不同阶段进行早期检查来减轻这种情况。然而,现有关于早期退出的研究主要集中在培训方案上,而无需考虑用例要求或部署平台。这项工作提出了HAPI,这是一种新的方法,用于通过将中间出口的放置以及推理时间的早期远期策略进行配合,从而产生高性能的早期外出网络。此外,我们提出了一种有效的设计空间探索算法,该算法可以更快地遍历大量替代体系结构,并生成最高表现的设计,该设计是根据用例要求和目标硬件定制的。定量评估表明,我们的系统始终优于各种潜伏期预算的替代搜索机制和最新的早期外观方案。此外,它进一步推动了高度优化的手工制作的早期外观CNN的性能,在嵌入式设备的强加潜伏期驱动的SLA上,通过轻量级型号的速度高达5.11倍。
Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. A growing body of work aims to alleviate this by exploiting the difference in the classification difficulty among samples and early-exiting at different stages of the network. Nevertheless, existing studies on early exiting have primarily focused on the training scheme, without considering the use-case requirements or the deployment platform. This work presents HAPI, a novel methodology for generating high-performance early-exit networks by co-optimising the placement of intermediate exits together with the early-exit strategy at inference time. Furthermore, we propose an efficient design space exploration algorithm which enables the faster traversal of a large number of alternative architectures and generates the highest-performing design, tailored to the use-case requirements and target hardware. Quantitative evaluation shows that our system consistently outperforms alternative search mechanisms and state-of-the-art early-exit schemes across various latency budgets. Moreover, it pushes further the performance of highly optimised hand-crafted early-exit CNNs, delivering up to 5.11x speedup over lightweight models on imposed latency-driven SLAs for embedded devices.