论文标题

SUOD:加速大规模无监督的异质异常检测

SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection

论文作者

Zhao, Yue, Hu, Xiyang, Cheng, Cheng, Wang, Cong, Wan, Changlin, Wang, Wen, Yang, Jianing, Bai, Haoping, Li, Zheng, Xiao, Cao, Wang, Yunlong, Qiao, Zhi, Sun, Jimeng, Akoglu, Leman

论文摘要

离群值检测(OD)是一项关键机器学习(ML)任务,用于识别具有多种高措施应用程序的一般样本中的异常对象,包括欺诈检测和入侵检测。由于缺乏地面真相标签,从业者通常必须构建大量无监督的,异质的模型(即具有不同超参数的不同算法),以进行进一步的组合和分析,而不是依靠单个模型。如何通过大量无监督,异质的OD模型来加速训练和对新的样本的评分(在本文中被称为预测)?在这项研究中,我们提出了一个模块化加速系统,称为Suod,以解决它。所提出的系统着重于三个互补的加速方面(高维数据的数据减少,昂贵的模型的近似值以及针对分布式环境的任务负载不平衡优化),同时保持性能准确性。对20多个基准数据集进行的广泛实验表明,Suod在异质OD加速度中的有效性,以及针对领先的医疗保健公司IQVIA的现实世界部署案例。我们开源的Sudod suod suod sudsible和可访问性。

Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection. Due to the lack of ground truth labels, practitioners often have to build a large number of unsupervised, heterogeneous models (i.e., different algorithms with varying hyperparameters) for further combination and analysis, rather than relying on a single model. How to accelerate the training and scoring on new-coming samples by outlyingness (referred as prediction throughout the paper) with a large number of unsupervised, heterogeneous OD models? In this study, we propose a modular acceleration system, called SUOD, to address it. The proposed system focuses on three complementary acceleration aspects (data reduction for high-dimensional data, approximation for costly models, and taskload imbalance optimization for distributed environment), while maintaining performance accuracy. Extensive experiments on more than 20 benchmark datasets demonstrate SUOD's effectiveness in heterogeneous OD acceleration, along with a real-world deployment case on fraudulent claim analysis at IQVIA, a leading healthcare firm. We open-source SUOD for reproducibility and accessibility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源