BIGDL 2.0：从笔记本电脑到分布式群集的AI管道无缝缩放

论文标题

BIGDL 2.0：从笔记本电脑到分布式群集的AI管道无缝缩放

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

论文作者

Dai, Jason, Ding, Ding, Shi, Dongjie, Huang, Shengsheng, Wang, Jiao, Qiu, Xin, Huang, Kai, Song, Guoqiong, Wang, Yang, Gong, Qiyuan, Song, Jiaming, Yu, Shan, Zheng, Le, Chen, Yina, Deng, Junwei, Song, Ge

论文摘要

大多数AI项目从一台笔记本电脑上运行的Python笔记本开始。但是，通常需要经历一系列痛苦来扩展它以处理较大的数据集（用于实验和生产部署）。这些通常需要为数据科学家充分利用可用的硬件资源（例如SIMD指令，多处理，量化，内存分配优化，数据分配，分布式计算等）的许多手动和错误的步骤。为了应对这一挑战，我们在https://github.com/intel-analytics/bigdl/ apache 2.0许可证（结合原始的BigDL和Analytics Zoo Projects）上，我们在https://github.com/intel-analytics/bigdl/上开了开源BIGDL 2.0；使用BIGDL 2.0，用户可以在其笔记本电脑上简单地构建常规的Python笔记本电脑（可能具有AUTOML支持），然后可以在单个节点上透明地加速（在我们的实验中具有最大的9.6倍加速），并无线缩放到一个大型集群中（在实际Wer-Word-Word-Words使用情况下）。 BIGDL 2.0在生产中已被许多现实世界中的用户（例如万事达卡，汉堡王，汉堡，启动等）采用。

Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger dataset (for both experimentation and production deployment). These usually entail many manual and error-prone steps for the data scientists to fully take advantage of the available hardware resources (e.g., SIMD instructions, multi-processing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL and Analytics Zoo projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production.

下载PDF全文

下载文献需遵守相关版权规定

论文标题