论文标题
BIGDL 2.0:从笔记本电脑到分布式群集的AI管道无缝缩放
BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster
论文作者
论文摘要
大多数AI项目从一台笔记本电脑上运行的Python笔记本开始。但是,通常需要经历一系列痛苦来扩展它以处理较大的数据集(用于实验和生产部署)。这些通常需要为数据科学家充分利用可用的硬件资源(例如SIMD指令,多处理,量化,内存分配优化,数据分配,分布式计算等)的许多手动和错误的步骤。为了应对这一挑战,我们在https://github.com/intel-analytics/bigdl/ apache 2.0许可证(结合原始的BigDL和Analytics Zoo Projects)上,我们在https://github.com/intel-analytics/bigdl/上开了开源BIGDL 2.0;使用BIGDL 2.0,用户可以在其笔记本电脑上简单地构建常规的Python笔记本电脑(可能具有AUTOML支持),然后可以在单个节点上透明地加速(在我们的实验中具有最大的9.6倍加速),并无线缩放到一个大型集群中(在实际Wer-Word-Word-Words使用情况下)。 BIGDL 2.0在生产中已被许多现实世界中的用户(例如万事达卡,汉堡王,汉堡,启动等)采用。
Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger dataset (for both experimentation and production deployment). These usually entail many manual and error-prone steps for the data scientists to fully take advantage of the available hardware resources (e.g., SIMD instructions, multi-processing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL and Analytics Zoo projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production.