AutoScale：在随机方差下优化端到端边缘推断的能源效率

论文标题

AutoScale：在随机方差下优化端到端边缘推断的能源效率

AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference under Stochastic Variance

论文作者

Kim, Young Geun, Wu, Carole-Jean

论文摘要

深度学习推论越来越多地在边缘运行。随着编程和系统堆栈支持变得成熟，它可以在移动系统中实现加速机会，在该系统中，系统性能信封与大量可编程的处理器进行了扩展。因此，专为移动用户设计的智能服务可以在CPU或移动系统上的任何合并处理器上运行推断，或利用连接的系统（例如云或附近，本地连接的系统）进行选择。通过这样做，服务可以扩展性能并提高边缘移动系统的能源效率。这引起了一个新的挑战 - 决定推理何时应在哪里进行。这种执行缩放决策与移动云执行的随机性质变得更加复杂，无线网络的信号强度变化和资源干扰可能会显着影响实时推理性能和系统能源效率。为了在边缘上实现准确，节能的深度学习推断，本文提出了AutoScale。 AutoScale是一种基于定制设计的增强算法的自适应和轻巧的执行缩放引擎。它通过考虑神经网络和可用的系统在协作云边缘执行环境中的特征，在适应随机运行时差异时，通过考虑神经网络和可用系统的特征来不断学习并选择最节能的推理执行目标。实际的系统实现和评估考虑了现实的执行方案，在满足实时性能和准确性要求的同时，在基线移动CPU和云下载方面的DNN边缘推理平均提高了DNN Edge推断的能源效率的平均9.8和1.6倍。

Deep learning inference is increasingly run at the edge. As the programming and system stack support becomes mature, it enables acceleration opportunities within a mobile system, where the system performance envelope is scaled up with a plethora of programmable co-processors. Thus, intelligent services designed for mobile users can choose between running inference on the CPU or any of the co-processors on the mobile system, or exploiting connected systems, such as the cloud or a nearby, locally connected system. By doing so, the services can scale out the performance and increase the energy efficiency of edge mobile systems. This gives rise to a new challenge - deciding when inference should run where. Such execution scaling decision becomes more complicated with the stochastic nature of mobile-cloud execution, where signal strength variations of the wireless networks and resource interference can significantly affect real-time inference performance and system energy efficiency. To enable accurate, energy-efficient deep learning inference at the edge, this paper proposes AutoScale. AutoScale is an adaptive and light-weight execution scaling engine built upon the custom-designed reinforcement learning algorithm. It continuously learns and selects the most energy-efficient inference execution target by taking into account characteristics of neural networks and available systems in the collaborative cloud-edge execution environment while adapting to the stochastic runtime variance. Real system implementation and evaluation, considering realistic execution scenarios, demonstrate an average of 9.8 and 1.6 times energy efficiency improvement for DNN edge inference over the baseline mobile CPU and cloud offloading, while meeting the real-time performance and accuracy requirement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题