论文标题
RCC Dual-GAN:一种有效的离群检测方法,很少识别出异常
RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies
论文作者
论文摘要
异常检测是数据挖掘的重要任务,并且在各种应用中都探索了许多技术。但是,由于默认假设异常值是非集中的,因此无监督的异常值检测可能无法正确检测较高密度水平的组异常。至于监督的离群值检测,尽管通常可以实现高检测率和最佳参数,但获得足够正确的标签是一项耗时的任务。为了解决这些问题,我们将重点放在几乎没有发现异常的半监督分离器检测上,以期使用有限的标签来实现高检测精度。首先,我们提出了一种新型的检测模型双GAN,该模型可以直接利用识别异常中的潜在信息来检测离散异常值并部分识别组异常。然后,考虑到具有相似输出值的实例在复杂的数据结构中可能并非都相似,因此我们用RCC和M-GAN(RCC-Dual-gan)的组合替换了双gAN中的两个Mo-GAN组件。此外,为了处理NASH平衡的评估和最佳模型的选择,将创建两个评估指标并引入两个模型,以使检测过程更加聪明。在基准数据集和两个实际任务上进行的广泛实验表明,即使只有少数已确定的异常情况,我们提出的方法(即双GAN和RCC DAL-GAN)也可以显着提高异常检测的准确性。此外,与双GAN中的两个MO-GAN组件相比,在各种情况下,将RCC和M-GAN结合的网络结构具有更大的稳定性。
Outlier detection is an important task in data mining and many technologies have been explored in various applications. However, due to the default assumption that outliers are non-concentrated, unsupervised outlier detection may not correctly detect group anomalies with higher density levels. As for the supervised outlier detection, although high detection rates and optimal parameters can usually be achieved, obtaining sufficient and correct labels is a time-consuming task. To address these issues, we focus on semi-supervised outlier detection with few identified anomalies, in the hope of using limited labels to achieve high detection accuracy. First, we propose a novel detection model Dual-GAN, which can directly utilize the potential information in identified anomalies to detect discrete outliers and partially identified group anomalies simultaneously. And then, considering the instances with similar output values may not all be similar in a complex data structure, we replace the two MO-GAN components in Dual-GAN with the combination of RCC and M-GAN (RCC-Dual-GAN). In addition, to deal with the evaluation of Nash equilibrium and the selection of optimal model, two evaluation indicators are created and introduced into the two models to make the detection process more intelligent. Extensive experiments on both benchmark datasets and two practical tasks demonstrate that our proposed approaches (i.e., Dual-GAN and RCC-Dual-GAN) can significantly improve the accuracy of outlier detection even with only a few identified anomalies. Moreover, compared with the two MO-GAN components in Dual-GAN, the network structure combining RCC and M-GAN has greater stability in various situations.