cachenet：一个模型缓存框架，用于深度学习边缘的推断

论文标题

cachenet：一个模型缓存框架，用于深度学习边缘的推断

CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge

论文作者

Fang, Yihao, Shalmani, Shervin Manzuri, Zheng, Rong

论文摘要

深度神经网络（DNN）在机器感知应用中的成功，例如图像分类和语音识别，以高计算和存储复杂性为代价。未压缩的大规模DNN模型的推断只能在云中运行，在云和最终设备之间来回交流延迟，而压缩DNN模型则以较低的预测精度的价格实现了对最终设备的实时推断。为了拥有两全其美的最好的（延迟和准确性），我们提出了一个模型缓存框架Cachenet。 cachenet在最终设备上的低复杂模型和边缘或云服务器上的高复杂性（或完整）模型。通过在流数据中利用时间位置，可以在预测准确性上没有或仅少降低边际降低，因此可以达到高缓存命中率，因此可以达到较短的延迟。 CIFAR-10和FVG上的实验表明，Cachenet比仅在最终设备或边缘服务器上运行推理任务的基线方法快58-217％。

The success of deep neural networks (DNN) in machine perception applications such as image classification and speech recognition comes at the cost of high computation and storage complexity. Inference of uncompressed large scale DNN models can only run in the cloud with extra communication latency back and forth between cloud and end devices, while compressed DNN models achieve real-time inference on end devices at the price of lower predictive accuracy. In order to have the best of both worlds (latency and accuracy), we propose CacheNet, a model caching framework. CacheNet caches low-complexity models on end devices and high-complexity (or full) models on edge or cloud servers. By exploiting temporal locality in streaming data, high cache hit and consequently shorter latency can be achieved with no or only marginal decrease in prediction accuracy. Experiments on CIFAR-10 and FVG have shown CacheNet is 58-217% faster than baseline approaches that run inference tasks on end devices or edge servers alone.

下载PDF全文

下载文献需遵守相关版权规定

论文标题