论文标题

金刚鹦鹉:机器学习磁力计校准工作流程

Macaw: The Machine Learning Magnetometer Calibration Workflow

论文作者

Bader, Jonathan, Styp-Rekowski, Kevin, Doehler, Leon, Becker, Soeren, Kao, Odej

论文摘要

在地球系统科学中,许多复杂的数据管道结合了不同的数据源并应用数据过滤和分析步骤。通常,此类数据分析过程历史上是通过许多顺序执行的脚本来实现和实现的。科学工作流管理系统(SWM)允许科学家使用其现有脚本,并为并行化,可重复性,监视或故障处理提供支持。但是,许多科学家仍然依靠其顺序称为脚本,并且不会从SWM提供的开箱即用的优势中获利。在这项工作中,我们将基于机器学习的方法的数据分析过程转换为校准了利用神经网络的非精型卫星的平台磁力仪到称为MacAW的工作流程(磁力计校准工作流)。我们提供有关将这些脚本移植到科学工作流程的工作流程和所需步骤的详细信息。我们的实验评估将原始HPC群集上的原始顺序脚本执行与我们在商品集群上的工作流实现。我们的结果表明,通过移植,我们的实施将分配的CPU小时减少了50.2%,并且记忆时间减少了59.5%,从而导致资源浪费大大减少。此外,通过并行化单个任务,我们将运行时减少了17.5%。

In Earth Systems Science, many complex data pipelines combine different data sources and apply data filtering and analysis steps. Typically, such data analysis processes are historically grown and implemented with many sequentially executed scripts. Scientific workflow management systems (SWMS) allow scientists to use their existing scripts and provide support for parallelization, reusability, monitoring, or failure handling. However, many scientists still rely on their sequentially called scripts and do not profit from the out-of-the-box advantages a SWMS can provide. In this work, we transform the data analysis processes of a Machine Learning-based approach to calibrate the platform magnetometers of non-dedicated satellites utilizing neural networks into a workflow called Macaw (MAgnetometer CAlibration Workflow). We provide details on the workflow and the steps needed to port these scripts to a scientific workflow. Our experimental evaluation compares the original sequential script executions on the original HPC cluster with our workflow implementation on a commodity cluster. Our results show that through porting, our implementation decreased the allocated CPU hours by 50.2% and the memory hours by 59.5%, leading to significantly less resource wastage. Further, through parallelizing single tasks, we reduced the runtime by 17.5%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源