边缘 - 穆尔蒂亚（Edge-Multiai）：延迟敏感的深度学习应用程序的多租户

论文标题

边缘 - 穆尔蒂亚（Edge-Multiai）：延迟敏感的深度学习应用程序的多租户

Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge

论文作者

Zobaed, SM, Mokhtari, Ali, Champati, Jaya Prakash, Kourouma, Mathieu, Salehi, Mohsen Amini

论文摘要

基于智能物联网的系统通常希望连续执行多个潜伏敏感的深度学习（DL）应用程序。边缘服务器是这种基于IoT系统的基石，但是，它们的资源限制阻碍了多个（多租户）DL应用程序的连续执行。挑战是，DL应用程序基于笨重的“神经网络（NN）模型”的功能，该功能无法同时保持在有限的边缘内存空间中。因此，这项研究的主要贡献是克服记忆争夺挑战，从而在不损害其推理准确性的情况下满足DL应用程序的潜伏期约束。我们提出了一个有效的NN模型管理框架，称为Edge-Multiai，它将DL应用程序的NN模型引入边缘存储器，以使多租期的程度和暖启动的数量最大化。 Edge-Multiai利用NN模型压缩技术（例如模型量化），并动态加载DL应用程序的NN模型，以刺激边缘服务器上的多租赁。我们还设计了一种称为IWS-BFE的Edge-Multiai的模型管理启发式方法，该启发性基于贝叶斯理论来预测对多租户应用的推理请求，并使用它来选择适当的NN模型来加载，从而增加温暖启动推论的数量。我们在各种配置下评估了边缘 - 穆尔蒂亚的功效和鲁棒性。结果表明，边缘 - 穆尔蒂亚族可以刺激边缘的多租户程度至少2倍，并将温暖启动的数量增加约60％，而不会在应用程序的推理准确性上造成任何重大损失。

Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT-based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky "neural network (NN) models" that cannot be simultaneously maintained in the limited memory space of the edge. Accordingly, the main contribution of this research is to overcome the memory contention challenge, thereby, meeting the latency constraints of the DL applications without compromising their inference accuracy. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory such that the degree of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI leverages NN model compression techniques, such as model quantization, and dynamically loads NN models for DL applications to stimulate multi-tenancy on the edge server. We also devise a model management heuristic for Edge-MultiAI, called iWS-BFE, that functions based on the Bayesian theory to predict the inference requests for multi-tenant applications, and uses it to choose the appropriate NN models for loading, hence, increasing the number of warm-start inferences. We evaluate the efficacy and robustness of Edge-MultiAI under various configurations. The results reveal that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题