MDinference：平衡移动应用程序的推理准确性和延迟

论文标题

MDinference：平衡移动应用程序的推理准确性和延迟

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications

论文作者

Ogden, Samuel S., Guo, Tian

论文摘要

深度神经网络允许移动设备将广泛的功能纳入用户应用程序中。但是，这些模型的计算复杂性使得很难在资源受限的移动设备上有效运行它们。先前的工作解决了通过降低模型复杂性或使用强大的云服务器来支持移动应用程序中深度学习的问题。这些方法每种方法都只关注移动推理的一个方面，因此它们经常牺牲整体表现。在这项工作中，我们引入了一种整体方法，用于设计移动深度推理框架。我们首先确定移动深度推理的准确性和延迟的关键目标，以及必须满足的条件才能实现它们。我们通过设计一个称为Mdinference的假设框架来证明我们的整体方法。该框架利用两种互补技术；从一组基于云的深度学习模型中选择的模型选择算法，以提高推理准确性，并在设备上请求重复机制以绑定延迟。通过经验驱动的模拟，我们表明，MDinference将静态方法的总准确度提高了40％以上而不会引起SLA违规。此外，我们表明，对于250ms的目标潜伏期，MDinference提高了更快的大学网络中99.74％病例的总准确性，而住宅网络的案例为96.84％。

Deep Neural Networks are allowing mobile devices to incorporate a wide range of features into user applications. However, the computational complexity of these models makes it difficult to run them effectively on resource-constrained mobile devices. Prior work approached the problem of supporting deep learning in mobile applications by either decreasing model complexity or utilizing powerful cloud servers. These approaches each only focus on a single aspect of mobile inference and thus they often sacrifice overall performance. In this work we introduce a holistic approach to designing mobile deep inference frameworks. We first identify the key goals of accuracy and latency for mobile deep inference and the conditions that must be met to achieve them. We demonstrate our holistic approach through the design of a hypothetical framework called MDInference. This framework leverages two complementary techniques; a model selection algorithm that chooses from a set of cloud-based deep learning models to improve inference accuracy and an on-device request duplication mechanism to bound latency. Through empirically-driven simulations we show that MDInference improves aggregate accuracy over static approaches by over 40% without incurring SLA violations. Additionally, we show that with a target latency of 250ms, MDInference increased the aggregate accuracy in 99.74% cases on faster university networks and 96.84% cases on residential networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题