论文标题
CIPCAD板凳:用于基准为因果发现方法的连续工业过程数据集
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods
论文作者
论文摘要
在制造过程中通常检查因果关系,以支持故障调查,进行干预并做出战略决策。 Industry 4.0提供了越来越多的数据,可实现数据驱动的因果发现(CD)。考虑到最近提出的CD方法的越来越多,有必要在公开可用的数据集中引入严格的基准测试程序,因为它们代表了公平比较和验证不同方法的基础。这项工作在连续制造过程中介绍了两个用于CD的新型公共数据集。第一个数据集使用著名的田纳西州伊士曼模拟器进行故障检测和过程控制。第二个数据集是从超处加工的食品制造厂中提取的,其中包括对该工厂的描述以及多个地面真相。这些数据集用于基于不同的指标提出基准测试程序,并根据各种CD算法进行评估。这项工作允许在现实条件下测试CD方法,从而可以选择用于特定目标应用的最合适方法。数据集可在以下链接上找到:https://github.com/giovannimen
Causal relationships are commonly examined in manufacturing processes to support faults investigations, perform interventions, and make strategic decisions. Industry 4.0 has made available an increasing amount of data that enable data-driven Causal Discovery (CD). Considering the growing number of recently proposed CD methods, it is necessary to introduce strict benchmarking procedures on publicly available datasets since they represent the foundation for a fair comparison and validation of different methods. This work introduces two novel public datasets for CD in continuous manufacturing processes. The first dataset employs the well-known Tennessee Eastman simulator for fault detection and process control. The second dataset is extracted from an ultra-processed food manufacturing plant, and it includes a description of the plant, as well as multiple ground truths. These datasets are used to propose a benchmarking procedure based on different metrics and evaluated on a wide selection of CD algorithms. This work allows testing CD methods in realistic conditions enabling the selection of the most suitable method for specific target applications. The datasets are available at the following link: https://github.com/giovanniMen