基于GAN的数据增强以解决类别不平衡

论文标题

基于GAN的数据增强以解决类别不平衡

GAN based Data Augmentation to Resolve Class Imbalance

论文作者

Vijayaraghavan, Sairamvinay, Guan, Terry, Jason, Song

论文摘要

随着技术的发展，信用卡欺诈的数量一直在增长，人们可以利用它。因此，实施一种健壮有效的方法来检测此类欺诈非常重要。机器学习算法适合这些任务，因为它们试图最大程度地提高预测的准确性，因此可以依靠。但是，在机器学习模型中存在一个即将到来的缺陷，因为样本集中的类别分布之间存在不平衡的原因。因此，在许多相关任务中，数据集有少数观察到的欺诈案件（有时发现了1％的正欺诈实例）。因此，这种不平衡的存在可能会通过将所有标签视为多数级别来影响任何学习模型的行为，因此在模型做出的预测中不允许泛化范围。我们培训了生成对抗网络（GAN），以产生大量令人信服的（可靠）的少数族裔典型示例，可用于减轻培训集中的阶级失衡，从而更有效地学习数据。

The number of credit card fraud has been growing as technology grows and people can take advantage of it. Therefore, it is very important to implement a robust and effective method to detect such frauds. The machine learning algorithms are appropriate for these tasks since they try to maximize the accuracy of predictions and hence can be relied upon. However, there is an impending flaw where in machine learning models may not perform well due to the presence of an imbalance across classes distribution within the sample set. So, in many related tasks, the datasets have a very small number of observed fraud cases (sometimes around 1 percent positive fraud instances found). Therefore, this imbalance presence may impact any learning model's behavior by predicting all labels as the majority class, hence allowing no scope for generalization in the predictions made by the model. We trained Generative Adversarial Network(GAN) to generate a large number of convincing (and reliable) synthetic examples of the minority class that can be used to alleviate the class imbalance within the training set and hence generalize the learning of the data more effectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题