共享的移动云推断协作情报

论文标题

共享的移动云推断协作情报

Shared Mobile-Cloud Inference for Collaborative Intelligence

论文作者

Ulhaq, Mateen, Bajić, Ivan V.

论文摘要

随着移动设备的AI应用程序变得越来越普遍，对神经模型推断的执行速度越来越越来越低。从历史上看，与大型最新研究模型相比，在移动设备上运行的模型较小，更简单，这只能在云上运行。但是，仅云的推断具有增加网络带宽消耗和更高延迟等缺点。此外，仅云的推理要求将输入数据（图像，音频）完全转移到云中，从而引起人们对潜在隐私漏洞的担忧。我们演示了一种替代方法：共享的移动云推断。在移动设备上进行部分推断，以减少输入数据的维度并到达紧凑的功能张量，该功能张量是输入信号的潜在空间表示。然后将功能张量传输到服务器以进行进一步推断。该策略可以改善推理潜伏期，能源消耗和网络带宽使用情况，并提供隐私保护，因为原始信号永远不会离开移动设备。通过在传输之前压缩功能张量来实现进一步的性能增益。

As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for neural model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. We demonstrate an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature tensor is then transmitted to the server for further inference. This strategy can improve inference latency, energy consumption, and network bandwidth usage, as well as provide privacy protection, because the original signal never leaves the mobile. Further performance gain can be achieved by compressing the feature tensor before its transmission.

下载PDF全文

下载文献需遵守相关版权规定

论文标题