论文标题

可解释的,多维的,多模式异常检测,用于检测设备故障的阴性采样

Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure

论文作者

Sipple, John

论文摘要

复杂的设备每天连接,并热切地生成大量的多维状态测量流。这些设备通常会根据外部条件(白天/夜,被占用/空置等)以不同的模式运行,并防止完全或部分系统的中断,当这些设备开始在正常模式之外开始运行时,我们希望尽早识别。不幸的是,使用规则或监督机器学习预测故障通常是不切实际或不可能的,因为故障模式太复杂,设备太新了,无法在特定环境中充分表征,或者环境变化使设备陷入了无法预测的条件。我们提出了一种无监督的异常检测方法,该方法从阳性,观察到的样本中产生负样本,并训练分类器以区分正样品和负样品。使用收缩原理,我们解释了为什么这样的分类器应该在正常区域和异常区域之间建立合适的决策界限,并显示综合梯度如何将异常归因于异常状态向量内的特定变量。我们已经证明,与隔离森林,一类SVM和Deep SVDD相比,对随机森林或神经网络分类器的负采样明显高于(a)一个尺寸介于2至128之间的合成数据集,其中1、2和3模式在2至128之间,并且具有噪声尺寸; (b)四个标准基准数据集; (c)来自真实气候控制设备的多维多模式数据集。最后,我们描述了如何成功部署使用神经网络分类器的负面抽样,以在145个Google Office建筑物中实时预测15,000多个气候控制和功率计设备的实时故障。

Complex devices are connected daily and eagerly generate vast streams of multidimensional state measurements. These devices often operate in distinct modes based on external conditions (day/night, occupied/vacant, etc.), and to prevent complete or partial system outage, we would like to recognize as early as possible when these devices begin to operate outside the normal modes. Unfortunately, it is often impractical or impossible to predict failures using rules or supervised machine learning, because failure modes are too complex, devices are too new to adequately characterize in a specific environment, or environmental change puts the device into an unpredictable condition. We propose an unsupervised anomaly detection method that creates a negative sample from the positive, observed sample, and trains a classifier to distinguish between positive and negative samples. Using the Contraction Principle, we explain why such a classifier ought to establish suitable decision boundaries between normal and anomalous regions, and show how Integrated Gradients can attribute the anomaly to specific variables within the anomalous state vector. We have demonstrated that negative sampling with random forest or neural network classifiers yield significantly higher AUC scores than Isolation Forest, One Class SVM, and Deep SVDD, against (a) a synthetic dataset with dimensionality ranging between 2 and 128, with 1, 2, and 3 modes, and with and without noise dimensions; (b) four standard benchmark datasets; and (c) a multidimensional, multimodal dataset from real climate control devices. Finally, we describe how negative sampling with neural network classifiers have been successfully deployed at large scale to predict failures in real time in over 15,000 climate-control and power meter devices in 145 Google office buildings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源