可变恒星分类的不平衡学习

论文标题

可变恒星分类的不平衡学习

Imbalance Learning for Variable Star Classification

论文作者

Hosenie, Zafiirah, Lyon, Robert, Stappers, Benjamin, Mootoovaloo, Arrykrishna, McBride, Vanessa

论文摘要

很难将可变星的准确自动分类到各自的子类型中。基于机器学习的解决方案通常会违反不平衡的学习问题，这会导致实践中的泛化性能差，尤其是在稀有的可变星形子类型上。在以前的工作中，我们试图通过开发层次机器学习分类器来克服此类缺陷。这种解决不平衡的“算法级别”方法在Catalina实时调查（CRTS）数据上取得了令人鼓舞的结果，表现优于先前在该领域应用的二进制和多类分类方案。在这项工作中，我们试图通过采用“数据级”方法直接增强培训数据来进一步提高层次分类性能，以便他们更好地描述代表性不足的类别。我们将结果应用于三种数据增强方法，特别是：$ \ textIt {r} $和$ \ textit {a} $ u Menteded $ \ textit {s} $ ampled $ \ textit {l} $ textit {l} $ ight curves light $ \ textit $ \ textit {e} $ rror {建模（$ \ texttt {gpfit} $）和综合少数群体过度采样技术（$ \ texttt {smote} $）。将“算法级别”（即层次结构方案）与“数据级”方法结合在一起时，我们将可变星分类精度提高1-4 $ \％$。我们发现，在层次模型中使用$ \ texttt {gpfit} $时，将获得更高的分类率。公制得分的进一步改进需要更好的标准一组正确识别的变量星，并且可能需要增强功能。

The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This 'algorithm-level' approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multi-class classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying 'data-level' approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: $\textit{R}$andomly $\textit{A}$ugmented $\textit{S}$ampled $\textit{L}$ight curves from magnitude $\textit{E}$rror ($\texttt{RASLE}$), augmenting light curves with Gaussian Process modelling ($\texttt{GpFit}$) and the Synthetic Minority Over-sampling Technique ($\texttt{SMOTE}$). When combining the 'algorithm-level' (i.e. the hierarchical scheme) together with the 'data-level' approach, we further improve variable star classification accuracy by 1-4$\%$. We found that a higher classification rate is obtained when using $\texttt{GpFit}$ in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars and, perhaps enhanced features are needed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题