论文标题
通过树结构的parzen估算值贝叶斯优化对不平衡表格数据的监督对比度学习
Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data
论文作者
论文摘要
班级失衡对大多数监督学习算法的预测性能有害,因为不平衡的分布会导致偏见,而偏爱多数类。为了解决这个问题,我们建议使用不平衡表格数据集的树结构的Parzen估计器(TPE)技术进行监督的对比度学习(SCL)方法。对比度学习(CL)即使没有标签也可以提取隐藏在数据中的信息,并且显示出一些不平衡的学习任务的潜力。 SCL进一步考虑了基于CL的标签信息,该标签信息还解决了表格数据的数据增强技术不足。因此,在这项工作中,我们建议使用SCL学习不平衡表格数据的歧视性表示。此外,SCL的高参数温度对性能具有决定性的影响,并且很难调节。我们介绍了著名的贝叶斯优化技术TPE,以自动选择最佳温度。实验是在二进制和多类不平衡表格数据集上进行的。如获得的结果所示,TPE的表现优于其他三种超参数优化(HPO)方法,例如网格搜索,随机搜索和遗传算法。更重要的是,与最先进的方法相比,所提出的SCL-TPE方法实现了众所周知的性能。
Class imbalance has a detrimental effect on the predictive performance of most supervised learning algorithms as the imbalanced distribution can lead to a bias preferring the majority class. To solve this problem, we propose a Supervised Contrastive Learning (SCL) method with Tree-structured Parzen Estimator (TPE) technique for imbalanced tabular datasets. Contrastive learning (CL) can extract the information hidden in data even without labels and has shown some potential for imbalanced learning tasks. SCL further considers the label information based on CL, which also addresses the insufficient data augmentation techniques of tabular data. Therefore, in this work, we propose to use SCL to learn a discriminative representation of imbalanced tabular data. Additionally, the hyper-parameter temperature of SCL has a decisive influence on the performance and is difficult to tune. We introduce TPE, a well-known Bayesian optimization technique, to automatically select the best temperature. Experiments are conducted on both binary and multi-class imbalanced tabular datasets. As shown in the results obtained, TPE outperforms three other hyper-parameter optimization (HPO) methods such as grid search, random search, and genetic algorithm. More importantly, the proposed SCL-TPE method achieves much-improved performance compared with the state-of-the-art methods.