论文标题

我们可以做得比随机开始更好吗?数据外包的力量

Can We Do Better Than Random Start? The Power of Data Outsourcing

论文作者

Chen, Yi, Dong, Jing, Tong, Xin T.

论文摘要

许多组织都可以访问大量数据,但缺乏处理数据的计算能力。尽管他们可以将计算任务外包给其他设施,但可以共享的数据量有各种限制。自然要问数据外包在此类约束下可以完成什么。我们从机器学习的角度解决了这个问题。当通过优化算法训练模型时,结果的质量通常很大程度上取决于初始化算法的点。随机开始是解决此问题的最流行方法之一,但是对于缺乏计算资源的组织而言,它在计算上可能是昂贵的,并且不可行。基于三种不同的情况,我们提出了基于仿真的算法,这些算法可以利用少量外包数据相应地找到好的初始点。在适当的规律性条件下,我们提供了理论保证,表明算法可以找到具有很高概率的良好初始点。我们还进行数值实验,以证明我们的算法的性能明显优于随机启动方法。

Many organizations have access to abundant data but lack the computational power to process the data. While they can outsource the computational task to other facilities, there are various constraints on the amount of data that can be shared. It is natural to ask what can data outsourcing accomplish under such constraints. We address this question from a machine learning perspective. When training a model with optimization algorithms, the quality of the results often relies heavily on the points where the algorithms are initialized. Random start is one of the most popular methods to tackle this issue, but it can be computationally expensive and not feasible for organizations lacking computing resources. Based on three different scenarios, we propose simulation-based algorithms that can utilize a small amount of outsourced data to find good initial points accordingly. Under suitable regularity conditions, we provide theoretical guarantees showing the algorithms can find good initial points with high probability. We also conduct numerical experiments to demonstrate that our algorithms perform significantly better than the random start approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源