论文标题
风暴:端到端经验风险最小化的基础
STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge
论文作者
论文摘要
经验风险最小化也许是统计学习中最具影响力的思想,其应用于几乎所有科学和技术领域的回归和分类模型的形式。为了分析分布式计算环境中的大量流数据集,从业者越来越喜欢在边缘而不是在云中部署回归模型。通过将数据保存在边缘设备上,我们可以最大程度地减少与模型相关的能量,通信和数据安全风险。尽管在边缘训练模型同样有利,但一个常见的假设是该模型最初是在云中训练的,因为训练通常需要实质性的计算和内存。为此,我们提出了Storm,这是一个在线草图,以最小化经验风险。 Storm将数据流压缩到一小部分整数计数器中。该草图足以估计原始数据集的各种替代损失。我们提供严格的理论分析,并表明风暴可以估计最小二乘目标的精心选择的替代损失。在现实世界数据集上线性回归模型的详尽实验比较中,我们发现Storm可以训练准确的回归模型。
Empirical risk minimization is perhaps the most influential idea in statistical learning, with applications to nearly all scientific and technical domains in the form of regression and classification models. To analyze massive streaming datasets in distributed computing environments, practitioners increasingly prefer to deploy regression models on edge rather than in the cloud. By keeping data on edge devices, we minimize the energy, communication, and data security risk associated with the model. Although it is equally advantageous to train models at the edge, a common assumption is that the model was originally trained in the cloud, since training typically requires substantial computation and memory. To this end, we propose STORM, an online sketch for empirical risk minimization. STORM compresses a data stream into a tiny array of integer counters. This sketch is sufficient to estimate a variety of surrogate losses over the original dataset. We provide rigorous theoretical analysis and show that STORM can estimate a carefully chosen surrogate loss for the least-squares objective. In an exhaustive experimental comparison for linear regression models on real-world datasets, we find that STORM allows accurate regression models to be trained.