论文标题

ASE:基于异常得分的集合学习,用于数据集不平衡

ASE: Anomaly Scoring Based Ensemble Learning for Imbalanced Datasets

论文作者

Liang, Xiayu, Gao, Ying, Xu, Shanrong

论文摘要

如今,已经将许多分类算法应用于各个行业,以帮助他们在现实生活中解决自己的问题。但是,在许多二进制分类任务中,少数族裔类中的样本仅构成了所有实例的一小部分,这导致我们通常患有高失衡比的数据集。现有模型有时将少数族裔类别视为噪音,或者将它们视为遇到数据偏斜的异常值。为了解决这个问题,我们提出了一个装袋合奏学习框架$ ASE $(基于异常得分的集合学习)。该框架具有基于异常检测算法的评分系统,可以通过将多数类中的样本分为子空间来指导重采样策略。然后,特定数量的实例将在每个子空间中采样较低,以通过与少数族裔类结合来构建子集。我们根据异常检测模型的分类结果和子空间的统计数据计算由子集训练的基本分类器的权重。已经进行了实验,这表明我们的集合学习模型可以显着提高基本分类器的性能,并且比在广泛的不平衡比率,数据量表和数据维度下的其他现有方法更有效。 $ ase $可以与各种分类器结合使用,我们的框架的每个部分都被证明是合理和必要的。

Nowadays, many classification algorithms have been applied to various industries to help them work out their problems met in real-life scenarios. However, in many binary classification tasks, samples in the minority class only make up a small part of all instances, which leads to the datasets we get usually suffer from high imbalance ratio. Existing models sometimes treat minority classes as noise or ignore them as outliers encountering data skewing. In order to solve this problem, we propose a bagging ensemble learning framework $ASE$ (Anomaly Scoring Based Ensemble Learning). This framework has a scoring system based on anomaly detection algorithms which can guide the resampling strategy by divided samples in the majority class into subspaces. Then specific number of instances will be under-sampled from each subspace to construct subsets by combining with the minority class. And we calculate the weights of base classifiers trained by the subsets according to the classification result of the anomaly detection model and the statistics of the subspaces. Experiments have been conducted which show that our ensemble learning model can dramatically improve the performance of base classifiers and is more efficient than other existing methods under a wide range of imbalance ratio, data scale and data dimension. $ASE$ can be combined with various classifiers and every part of our framework has been proved to be reasonable and necessary.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源